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FOREWORD 


The  Navy  Personnel  Research  and  Development  Center  is  the  lead 
laboratory  for  the  Enhanced  Computerized  Aptitude  Testing  (ECAT)  project.  The 
purpose  of  the  project  is  to  assess  the  cost/benefits  of  adding  new  aptitude 
tests  to  the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB).  This  report 
solves  the  important  problem  of  how  to  combine  the  results  from  different 
studies  with  different  criteria  in  order  to  arrive  at  estimates  of  the 
incremental  validity  of  adding  new  tests  to  the  ASVAB.  The  issue  is  of 
practical  importance  because  many  of  the  samples  under  investigation  are  too 
small  to  allow  firm  conclusions  to  be  drawn  unless  their  data  are  combined 
with  those  of  other  samples.  This  report  will  be  useful  both  to  military 
personnel  researchers  and  to  a  broad  civilian  research  community  concerned 
with  the  validity  of  aptitude  tests. 

This  effort  was  conducted  under  the  ECAT  project  sponsored  by  the  Office 
of  the  Assistant  Secretary  of  Defense  (Force  Management  &  Personnel,  Military 
Manpower  &  Personnel  Policy).  It  was  funded  by  Headquarters,  U.S.  Military 
Entrance  Processing  Command  (USMEPCOM)  with  U.  S.  Army  Operations  and 
Maintenance  funds  (MIPR  a9-R-114).  The  report  was  written  under  the  Army 
Research  Office  contract  DAAL03-86-D-0001,  TCN  89-517,  D.O.  1723  with  Battelle 
Memorial  Institute. 

John  H.  Wolfe  was  the  Contracting  Officer's  Technical  Representative 
(COTR)  for  the  task. 


THOMAS  F.  FINLEY  RICHARD  C.  SORENSON 
Captain,  U.  S.  Navy  Technical  Director  (Acting) 
Commanding  Officer 


SUMMARY 


nroUem 

The  Navy  Personnel  Research  and  Development  Center  (NPROC)  has  developed 
new  aptitude  tests  for  possible  addition  to  the  Armed  Services  Vocational 
Aptitude  Battery  (ASVAB).  A  computerized  version  of  the  ASVAB  (the  CAT-ASVAB) 
was  also  developed.  Validity  studies  are  currently  underway  to  determine 
whether  the  new  teste  on  the  CAT-ASVAB  produce  an  increment  in  validity 
computed  using  a  job  performance  criterion.  However,  few  single  sites 
(schools)  have  a  large  enough  sample  to  produce  a  sufficiently  powerful  test 
for  the  validity  increment  expected.  Consequently,  it  will  be  necessary  to 
pool  information  across  sites  to  obtain  a  sufficiently  powerful  test  for 
incremental  validity.  The  methods  for  such  pooling  had  not  previously  been 
developed. 

ObjectiTe 

The  objective  of  this  research  was  to  develop  statistical  methods  for 
pooling  estimates  of  incremental  validity  across  independent  studies  (sites), 
estimate  the  standard  errors  and  a  confidence  interval  for  the  pooled 
incremental  validity,  and  test  the  statistical  significance  of  the  incremental 
validity. 

Approadi 

A  search  of  the  literature  on  combining  statistical  estimates  was 
conducted  to  determine  applicable  methods.  The  statistical  literature  on  the 
sampling  theory  of  multiple  correlations  was  also  searched.  Mathematical 
(analytic)  methods  were  used  to  derive  procedures  that  were  not  previously 
available. 

Results 


Methods  for  obtaining  the  sampling  distributions  of  incremental 
validities  (the  differences  between  multiple  correlations)  from  the  same  and 
from  independent  seunples  were  developed.  These  results  were  applied  to  yield 
methods  for  pooling  incremental  validities,  testing  the  statistical 
significance  of  pooled  validities,  and  constructing  confidence  intervals  for 
the  pooled  incremental  validity.  They  were  also  applied  to  a  power  analysis 
of  the  pooled  test  for  incremental  validity. 

Coodusioa 

Pooling  estimates  across  sites  provides  a  viable  strategy  for  estimating 
the  incremental  validity.  If  a  single  sample  is  used  in  each  site  to  assess 
incremental  validity,  the  test  for  the  statistical  significance  of  the  pooled 
estimate  will  have  adequate  power  to  detect  increments  in  validity  of  .02  with 
pooled  sample  sizes  of  N  2  4,000. 
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Estimatea  of  the  incremental  validity  of  alternative  teat  batteries 
should  be  based  on  pooled  estimates  derived  from  several  samples,  using  the 
methods  outlimed  in  this  report. 
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INTRODUCTION 


nrobiem 

Current  practice  involves  the  use  of  a  battery  of  tests  to  predict  a 
criterion.  We  want  to  determine  whether  adding  another  battery  of  tests  to 
the  operational  battery  improves  validity  and  by  what  amount.  Because  the 
increment  in  validity  expected  from  the  new  tests  is  small  (e.g.,  .02),  a 
sample  size  of  several  thousand  may  be  needed  to  detect  the  validity  increment 
with  high  statistical  power.  The  tests  are  used  in  a  variety  of  sites 
(schools),  but  few  single  schools  have  sufficiently  large  enrollments  to  carry 
out  a  powerful  study  of  the  improvements  in  validity  that  might  result  from 
the  addition  of  the  battery  of  new  tests. 

Alternatively,  current  practice  may  involve  the  use  of  one  test  battery 
to  predict  a  criterion  and  we  may  wish  to  know  if  an  alternative  test  battery 
has  greater  predictive  validity.  For  example,  we  may  wish  to  compare  the 
validity  of  a  paper  and  pencil  version  of  a  test  battery  to  that  of  a 
computerized  adaptive  version  of  the  same  test  battery. 

Objective 

In  these  situations,  pooling  of  information  on  incremental  validity 
across  several  sites  (e.g.,  schools)  may  provide  a  way  to  test  the  increment 
in  validity  with  high  statistical  power  and  to  estimate  precisely  the 
improvement  in  validity  that  results  from  the  addition  of  the  new  tests  or  the 
use  of  the  alternative  tests.  It  is  assumed  that  the  criterion  scores  used  in 
different  sites  are  too  dissimilar  to  permit  combination  of  raw  data.  This  is 
likely  to  be  the  case  when  the  criteria  are  training  grades,  performance 
ratings,  or  simulations  of  work  skills  that  are  unique  to  the  individual 
school  site. 

This  report  develops  a  method  for  combining  estimates  of  incremental 
validity  across  sites  to  obtain  the  most  precise  estimate  of  the  average 
incremental  validity.  It  also  provides  procedures  for  estimating  the  standard 
error  of  the  incremental  validity,  for  establishing  confidence  intervals  about 
the  incremental  validity,  and  for  testing  the  combined  significance  of  the 
validity-study  results.  Finally  the  report  shows  how  to  estimate  the  variance 
in  incremental  validity  parameters  and  how  to  test  its  statistical 
Tignificance. 

Bachcround 

The  Navy  Personnel  Research  and  Development  Center  (NPROC)  has  been 
engaged  in  a  project  to  evaluate  new  aptitude  tests  that  measure  abilities  not 
covered  in  the  existing  battery  of  ten  tests  in  the  Armed  Services  Vocational 
Aptitude  Battery  (ASVAB).  It  has  also  been  engaged  in  the  development  of  a 
computerized  adaptive  version  of  the  ASVAB  called  the  CAT-ASVAB.  Validity 
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studies  are  currently  underway  to  determine  the  magnitude  of  incremental 
validity  obtained  by  using  the  new  tests  as  supplements  to  the  ten  operational 
ASVAB  tests  that  are  used  as  predictors  of  school  and  job  performance.  Navy 
studies  should  also  provide  data  on  the  incremental  validity  of  the  CAT-ASVAB 
over  the  operational  paper  and  pencil  version  of  the  test.  The  present  report 
suggests  methodology  for  carrying  out  studies  of  incremental  validity  in 
connection  with  these  recent  NPRDC  developments. 

INCREMENTAL  VALIDITY 

Given  a  single  site,  we  might  define  the  validity  of  test  battery  1  as 
the  multiple  correlation  of  the  a  tests  in  that  battery  with  the  criterion. 
Define  the  validity  of  teat  battery  2  (which  may  consist  of  tests  in  battery  1 
plus  some  new  tests)  as  the  multiple  correlation  R2  of  the  tests  in  battery  2 
with  the  criterion.  It  is  helpful  to  distinguish  the  true  or  population 
values  of  validities  from  their  sample  estimates.  Hence  denote  the  sample 
estimates  of  the  validities  by  R^^  and  R2  and  denote  the  population  values 
corresponding  to  these  sample  estimates  by  and  P2.  The  idea  of  incremental 
validity  also  arises  in  connection  with  the  comparison  of  two  alternative  test 
batteries;  for  example,  a  paper  and  pencil  test  battery  versus  a  computerized 
adaptive  test  battery.  In  this  case,  the  two  test  batteries  may  not  share  any 
single  test.  In  this  case,  the  validities  R^  and  P^  are  the  sample  and 
population  multiple  correlations  of  the  a  tests  in  battery  1  with  the 
criterion.  The  Seunple  and  population  validities  of  test  battery  2,  R2  and  P2, 
respectively,  are  the  sample  and  population  multiple  correlations  of  the  b 
testa  in  battery  2  with  the  criterion. 

The  two  multiple  correlations  that  are  used  as  validity  coefficients  are 
stochastically  dependent  when  they  are  computed  from  measurements  on  the  same 
sample  of  individuals.  The  correlations  are  stochastically  independent  when 
they  are  computed  from  independent  samples. 

Tesdng  the  Statistical  Signifkance  of  Incremental  Validity  at  a  Single  Site 

At  each  site,  the  incremental  validity  study  compares  a  sample  validity 
Rj  with  another  sample  validity  R2  to  determine  whether  P2  is  larger  than  Pj. 
Formally  this  involves  a  test  of  the  hypothesis  that  the  population  validity 
P2  associated  with  R2  exceeds  the  population  validity  Pj^  associated  with  Rj^; 
that  is,  a  test  of  the  hypotinesis 

^0  ’  *’2  ~  ^1' 

versus  the  alternative  that  P2  > 

Since  Pj  and  P2  are  nonnegative,  P2  >  Pj  implies  that  P2  >  P^;  therefore 
a  test  for  P2  >  P^  is  identical  to  a  test  for  P2  >  Pj*  The  details  of  the 
hypothesis  test  depend  on  whether  the  Seune  sample  is  used  to  compute  both  Rj^ 
and  R2  or  whether  R^  and  R2  are  computed  from  independent  samples. 

Note  that  the  artifacts  of  criterion  unreliability  and  restriction  of 
range  do  not  alter  the  procedures  for  testing  hypotheses  about  incremental 
validity. 
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Ri  and  R,  Computed  from  the  Same  Sample 

Case  1.  If  and  R2  are  computed  from  the  same  sample  and  if  the 
predictors  of  R^  are  a  subset  of  the  predictors  for  R2,  then  the  appropriate 
test  for  incremental  validity  is  the  usual  test  for  change  in  multiple 
correlation.  Let  a  be  the  number  of  tests  used  as  predictors  in  R^^,  let  b  >  a 
be  the  number  of  tests  used  as  predictors  in  R2/  and  let  n  be  the  sample  size. 
The  test  statistic  is 


(R^  -  R^)(n  -  b  -  1) 

F  =  _  ,  (1) 

(1  -  R^)(b  -  a) 

which  is  compared  to  the  critical  value  for  an  F  distribution  with  (b  -  a)  and 
(n  -  b  -  1)  degrees  of  freedom. 


Case  2.  If  R^  and  R2  are  computed  from  the  same  sample  but  their  predictor 
seta  are  disjoint  (e.g.,  when  R^  is  computed  from  a  pre-enlistment  test 
battery  and  R2  is  computed  from  a  post-enlistment  test  battery),  the  usual  F- 
test  for  change  in  multiple  correlation  cannot  be  used.  Let  a  be  the  number 
of  predictors  used  to  compute  Rj^  and  b  be  the  number  of  predictors  used  to 
compute  R2.  Here  b  need  not  be  larger  than  a.  A  large  seunple  test  for  the 
significance  of  the  incremental  validity  uses  the  statistic 


X2  . 
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^  it 

oS(d‘) 

where  R^  and  Rj  are  the  corrected  squared  multiple  correlations  for  the  two 
predictor  sets  and  o£(d*)  is  the  asymptotic  variance  of  the  difference  in 
corrected  squared  multiple  correlations.  Much  of  the  mathematical  development 
in  this  paper  will  be  devoted  to  estimating  o^(d*).  The  test  statistic 
(Equation  2)  has  a  chi-square  distribution  with  one  degree  of  freedom  when 
there  is  no  incremental  validity  (but  both  of  the  validities  are  nonzero)  and 
the  sample  size  is  large.  The  hypothesis  of  no  incremental  validity  is 
rejected  at  significance  level  a  if  the  computed  value  of  X^  exceeds  the 
lOO(l-a)  percentile  point  of  the  chi-square  distribution  with  one  degree  of 
freedom. 

R,  and  Rj  Computed  from  Independent  Samples 

If  R^  and  R2  are  computed  from  independent  samples  (e.g.,  when  R^^  is 
computed  from  the  scores  of  subjects  who  took  a  paper  and  pencil  test  battery 
and  R2  is  computed  from  the  scores  of  subjects  who  took  a  computerized 
adaptive  test  battery),  the  usual  F-test  for  change  in  multiple  correlation 
cannot  be  used.  Let  a  be  the  number  of  predictors  used  to  compute  R^  and  b  be 
the  number  of  predictors  used  to  compute  R2.  Here  b  need  not  be  larger  than 
a.  Let  n]^  be  the  sample  size  on  which  R^  is  based  and  let  n2  be  the  sample 
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size  on  which  R2  is  based.  A  large  sample  test  for  the  significance  of  the 
incremental  validity  uses  the  statistic 


X 


2 
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The  test  statistic  (Equation  3)  has  a  chi-square  distribution  with  one  degree 
of  freedom  when  there  is  no  incremental  validity  (but  both  of  the  validities 
are  nonzero)  and  both  n^  and  n2  are  large.  The  hypothesis  of  no  incremental 
validity  is  rejected  at  significance  level  a  if  the  computed  value  of 
exceeds  the  100 (1-a)  percentile  point  of  the  chi-square  distribution  with  one 
degree  of  freedom. 

The  design  that  involves  computing  ^2  same  sample  yields 

more  powerful  tests  for  incremental  validity.  Consequencly  it  is  the  design 
of  choice  wherever  it  is  feasible. 


Indices  of  Incremental  Validity 


Two  indices  of  incremental  validity  might  be  computed.  One  index  is 
simply  the  difference  in  validities.  That  is,  the  index,  d,  is  the  difference 
in  (unsquared)  multiple  correlations.  The  sample  value  of  this  index  is 

d  =  R2  —  R^ 

and  the  population  value  is 
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^2 


^1- 


The  index  is  conventionally  used  in  personnel  psychology  and  is,  for  example, 
used  in  the  style  of  utility  analyses  described  by  Cronbach  and  Gleser  (1965). 


An  alternative  index  of  incremental  validity  is  the  R-squared  change: 
the  difference  in  squared  multiple  correlations.  The  sample  value  of  this 
index  is 


and  the  population  value  is 


This  index  has  the  virtue  that  it  is  interpretable  in  terms  of  "additional 
variance  accounted  for"  by  the  new  test  battery. 

USING  COMBINED  SIGNDICANCE  TO  STUDY  INCREMENTAL  VALIDITY 

One  of  the  oldest  methodologies  for  combining  the  results  of  independent 
studies  uses  the  observed  significance  levels  (p's  or  probabilities)  from 
series  of  significance  tests.  Tests  of  combined  significance  utilize  the 
probability  values  from  series  of  studies  examining  the  same  research 
question.  Several  combined  significance  methods  have  been  outlined  in  the 
social-science  literature  by  Rosenthal  (1978)  and  previously  by  Hosteller  and 
Bush  (1954). 
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Although  more  than  15  distinct  methods  for  summarizing  observed 
probabilities  have  been  proposed,  the  methods  share  some  similarities.  Table 
1  lists  the  methods  and  shows  that  they  fall  into  two  major  groups.  One  group 
comprises  tests  based  on  the  fact  that  the  observed  significance  levels  are 
uniformly  distributed  under  the  null  hypothesis  of  "no  effect"  in  any  study. 
The  other  methods  involve  transformations  of  the  observed  significance  levels 
to  other  statistical  variables  (e.g.,  probabilities  transformed  to  normal 
variates).  All  of  the  methods  listed  in  Table  1  provide  tests  of  the 
same  null  model  for  the  series  of  studies.  We  outline  that  model  for  studies 
of  incremental  validity  below. 


Table  1 

Methods  for  Simunariziiig  Independent  Significance  Values 


Methods  Requiring  Transformation  of  p  Values 

Indicator  Function  Methods 

Wilkinson  method  (Wilkinson,  1951) 
Tippett  method  (Tippett,  1931) 

Sign  test 
Chi-square  method 

Inverse  Probability  Methods 

Inverse  Normal  Distribution  Methods 

Stouffer  method  (Stouffer  et  al.,  1949) 
Weighted  Stouffer  method 
Mean  z  method 

Inverse  t  Distribution  Method 

Winer  method  (Winer,  1971) 

Inverse  Chi-square  Distribution  Methods 
Inverse  chi-square  method 
Weighted  inverse  chi-square  method 

Logistic  Function  Methods 

Fisher  method  (Fisher,  1932) 

Good  (weighted  Fisher)  method 
Logit  method 

Methods  not  Requiring  Transformation  of  p-values 

Sum  of  p's  method 
Mean  p  method 


Model  for  Study  Results 

Suppose  that  there  are  k  independent  validity  studies,  each  yielding  a 
test  of  incremental  validity.  We  consider  as  illustration  the  situation  in 
which  a  single  set  of  subjects  provides  the  information  on  incremental 
validity  within  every  study  (i.e.,  the  estimates  of  incremental  validity  are  d 
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and  d*  as  in  our  previous  notation  system).  Each  study  in  the  series  examines 
a  test  of  the  difference  in  validity  between  one  battery  including  a  tests  and 
a  second  battery  of  b  tests  (where  b  >  a) .  Each  study  might  represent  a 
training  school  or  site  for  which  it  is  useful  to  predict  job  performance. 

As  in  the  case  in  which  parametric  estimates  of  incremental  validity  are 
of  interest,  the  ith  study  is  assumed  to  provide  a  test  of  either  or  6^*. 

In  study  i,  6^^  =  P^2  “  ^ii  difference  in  unshared  multiple 

correlations  in  the  population,  and  5^^*  =  **i2^  "  **il^  difference  in 

squared  multiple  correlations.  The  null  hypothesis  (the  model  of  no  added 
validity)  for  the  ith  study  would  be  either 

Hq  :  6i  =  0, 

or  equivalently  Hq  ;  6^^  =0. 

In  each  study,  the  usual  F-test  for  R-squared  change  provides  a 
significance  test  of  the  null  hypothesis  based  on  the  sample  estimatesi  of 
incremental  validity.  The  F  statistic  for  incremental  validity  for  a  sample 
of  size  n  from  a  single  study  or  school  is  given  by  Equation  1  or 

(R^  -  Rf)(n  -  b-1) 

F  = 


(1  -  R^) (b  -  a) 

This  statistic  is  distributed  as  a  central  F  value  with  (b  -  a)  and  (n  -  fe 
-  1)  degrees  of  freedom  under  the  null  model  of  no  contribution  to  validity 
from  the  added  test  battery  in  the  study.  The  significance  value  from  this 
test  is  the  probability  (p)  of  observing  a  value  equal  to  F  or  larger  in  the  F 
distribution  with  (b  -  a)  and  (n  -  -  1)  degrees  of  freedom. 

From  each  F-test  is  obtained  an  observed  upper-tail  significance  level. 
The  observed  probability  from  the  ith  study  is  p^^,  and  the  data  used  in  the 
combined  significance  tests  are  the  k  one-tailed  probability  values 

Pi  f  P2 '  •  •  • »  Pjj* 

Null  Hypothesis 

The  null  hypothesis  for  tests  of  combined  significance  is  an  omnibus 
hypothesis,  namely  that  the  null  hypothesis  is  true  in  every  one  of  the 
studies  in  the  synthesis.  Thus  the  overall  null  model  for  any  combined 
significance  test  for  validity  studies  is 

Hqs  6^  =  §2  —  •  •  •  ”  5}^  —  Of 

in  the  case  of  differences  in  multiple  correlations,  or  equivalently 
Hq:  6^*  =  62*  =  •••  =  *k*  “ 

when  differences  in  squared  correlations  represent  added  validity.  Regardless 
of  the  par£uneter(s)  used  to  represent  validity,  the  null  model  for  the  series 
of  studies  is  that  the  additional  tests  do  not  increase  validity  in  any 
population  studied. 
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One  assumption  of  the  combined  significance  methods  is  that  the 
alternatives  to  the  null  hypothesis  are  one-sided.  Typically  this  assumption 
appears  as  a  restriction  that  the  parameter  tested  in  each  study  cannot  be 
negative,  and  it  leads  to  the  condition  that  only  one-tailed  significance 
values  are  used  in  tests  of  combined  significance.  In  some  cases,  this 
restriction  requires  redefining  the  parcuneters  of  the  hypothesis  for  each 
study.  For  example,  a  hypothesis  might  be  restated  by  defining  the  parameter 
of  interest  as  6^  rather  than  6,  if  both  negative  and  positive  6  values  were 
interesting.  In  the  case  of  validity  studies,  the  alternative  hypothesis  is 
naturally  one-sided,  because  additional  tests  can  only  increase  the  validity 
of  prediction  in  each  population,  not  decrease  it. 

Though  the  null  hypothesis  for  the  combined  significance  summaries  is 
quite  simple,  deviations  from  the  null  hypothesis  can  occur  in  a  variety  of 
different  ways.  Thus,  the  interpretation  of  a  rejected  null  hypothesis  is  not 
completely  straightforward  (see  also  Becker,  1987). 

Let  us  consider  the  validity-study  context.  One  way  that  the  null  model 
can  be  false  is  if  all  populations  studied  show  increased  validity  because  of 
the  added  tests.  However,  the  null  model  is  also  false  when  a  single 
population  shows  an  increase  in  validity  and  the  others  do  not.  Both  of  these 
outcomes  should  lead  to  the  rejection  of  the  null  hypothesis  based  on  a  test 
of  combined  significance,  but  they  represent  situations  that  are  qualitatively 
very  different. 

Additionally,  the  combined  significance  tests  themselves  perform 
differently  with  regard  to  the  detection  of  these  various  patterns  of  outcomes 
(i.e.,  different  alternatives  to  the  null  model).  Statistical  theory  (e.g., 
Oosterhoff,  1969)  has  shown  that  none  of  the  combined  significance  tests  is 
uniformly  most  powerful  against  all  alternative  hypotheses.  Empirical  results 
from  simulation  (Monte  Carlo)  studies  (e.g.,  Becker,  1985;  George,  1977) 
provide  some  guidelines  for  the  selection  of  a  test  procedure  and  show  that, 
in  some  cases,  differences  in  power  among  tests  are  slight.  However, 
optimally  one's  choice  of  a  statistical  procedure  should  depend  on  both  the 
nature  of  the  expected  (or  interesting)  outcomes  of  the  series  of  studies  and 
the  behavior  of  the  available  tests. 

Combined  Significance  Methods 

Rosenthal  (1978)  reviewed  eight  combined  significance  tests,  and  there 
are  others  (e.g.,  Mudholkar  &  George,  1979).  We  present  two  combined 
significance  methods.  One  is  the  method  most  highly  recommended  by  Rosenthal — 
the  Stouffer  method,  which  is  the  "method  of  adding  z's"  described  by 
Stouffer,  Suchman,  DeVinney,  Star,  and  Williams  (1949).  The  second  method  was 
suggested  by  Fisher  (1932). 

We  have  selected  these  two  tests  because  power  studies  (Becker,  1985; 
Koziol  &  Perlman,  1978)  have  shown  that  these  tests  perform  in  a  complementary 
manner.  Specifically,  the  Stouffer  test  appears  to  have  good  power  to  detect 
alternatives  in  which  all  the  populations  studied  show  roughly  equal  effect 
sizes.  In  the  validity-study  context,  this  would  be  a  situation  in  which  the 
increases  in  validity  were  roughly  the  same  for  all  schools  or  job-groups 
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studied.  Fisher's  method  has  higher  power  to  detect  individual  (or  small 
numbers  of)  discrepant  populations.  Such  patterns  might  arise  when  the  added 
test  batteries  increased  validity  in  only  a  few  schools. 

Stouffer’s  Method 

Stouffer's  test  of  combined  significance  (Stouffer  et  al.,  1949)  is 
obtained  by  summing  the  standard  normal  deviates  or  z  values  associated  with 
the  values  p^^  through  P]^.  The  sum  is  divided  by  the  square  root  of  k  (the 
number  of  p  values),  which  is  the  standard  deviation  of  the  sum  of  the  k 
standard  normal  deviates.  The  test  statistic  for  this  ratio  is 

k 

Zg  *  I  z{p^)/'/k  ,  (4) 

i=l 

where  z(Pi)  =  (Pj^)  represents  the  standard  normal  deviate  associated  with 

upper-tail  probability  Pj^  from  the  ith  study.  This  test  can  be  computed  using 
the  mathematical  and  statistical  functions  of  programs  such  as  SAS  (1990), 
Minitab  (Ryan,  Joiner,  &  Ryan,  1985),  or  SPSS  (1988).  FORTRAN  prograuns  can 
also  be  written  to  produce  the  combined  significance  values.  A  listing  of  a 
SAS  program  appears  in  Appendix  A. 

The  statistic  in  Equation  4  is  compared  with  upper-tail  critical  values 
from  a  table  of  the  standard  normal  distribution.  The  test  is  not  conducted 
as  a  two-sided  test  because  negative  Zg  values  do  not  have  a  meaningful 
interpretation  in  this  context.  Negative  Zg  values  result  from  combinations 
of  negative  z(p^)  values,  which  in  turn  result  from  p  values  larger  than  0.5; 
that  is,  from  nonsignificant  individual  test  results.  Thus  large  negative  Zg 
values  do  not  represent  interesting  deviations  from  the  conditions  specified 
by  the  null  hypothesis  for  the  series  of  studies. 

Fisher’s  Method 

A  second  widely  used  method  for  combining  probabilities  was  suggested  by 
Fisher  (1932).  A  related  version  of  this  test  was  also  independently 
described  by  Pearson  (1933).  The  method  requires  the  transformation  of  the 
independent  probabilities  via  the  log  function.  These  values  are  multiplied 
by  the  constant  -2,  which  produces  (under  Hq)  a  set  of  identically  distributed 
chi-square  variates,  each  with  2  degrees  of  freedom.  The  Fisher  test 
statistic  is 

k 

Cj.  »  -2  E  log  (Pi),  (5) 

i=l 

which  is  a  chi-square  variable  with  2k  degrees  of  freedom  under  Hq.  The 
computation  of  the  Fisher  test  is  also  shown  in  the  SAS  program  in  Appendix  A. 
If  the  probability  values  associated  with  the  significance  test  for  change  in 
multiple  correlation  are  available,  then  Fisher's  test  can  also  be  computed 
using  most  spreadsheets  (which  typically  feature  the  log  function) . 
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Use  of  Multiple  Combined  Significance  Tests 

Some  authors  have  recommended  the  use  of  several  combined  significance 
tests  together.  Use  of  multiple  combined  significance  tests  and  use  of 
combined  significance  methods  together  with  techniques  for  pooling  or 
estimating  common  study  results  is  fairly  common,  but  it  leads  to  slightly 
elevated  levels  of  Type  I  error.  Elevated  error  rates  occur  when  several 
combined  significance  summaries  are  applied  because  they  are  baaed  on  the  same 
data  (the  p's).  Those  data  are  not  independent  of  the  estimates  of  study 
outcomes  (e.g.,  incremental  validities)  as  well.  The  usual  Bonferroni  method 
(Miller,  1966)  can  be  applied  to  protect  the  overall  significance  level  of  the 
set  of  tests  if  it  is  necessary  to  compute  several  combined  significance 
summaries. 

Validitj  Studies  Based  on  Independent  Samples 

Occasionally  validity  studies  compare  R  or  values  computed  for 
independent  samples  of  subjects.  In  such  cases,  the  test  of  incremental 
validity  in  the  individual  studies  will  not  be  the  F-test  for  change  in 
correlation. 

However,  combined  significance  methods  can  be  applied  to  the 
probabilities  from  tests  based  on  independent  scunples  in  the  same  manner 
described  above.  The  SAS  routine  in  Appendix  A  would  need  to  be  modified  by 
replacing  the  computed  F-test  with  the  test  described  for  use  in  individual 
validity  studies  based  on  independent  saunples.  This  is  the  test  given  in 
Equation  3.  In  such  cases,  the  probability  values  p^  through  P]^  would  be 
obtained  from  the  series  of  tests  from  the  ]c  schools.  Computation  of  the 
combined  significance  tests  would  proceed  exactly  as  outlined  above. 

Furthermore,  the  nonparametric  form  of  the  combined  significance  methods 
does  not  preclude  combining  p's  from  different  validity-study  designs  (i.e., 
p's  from  tests  based  on  dependent  samples  and  p's  from  independent  samples). 
However,  in  order  for  the  summaries  to  be  most  meaningful,  all  studies  should 
examine  the  same  hypothesis  (or  very  similar  hypotheses)  about  incremental 
validity. 


POOLING  ESTIMATES  OF  INCREMENTAL  VALIDITIES 
Correctiiig  Incnmental  Validities  for  Artifacts  of  Restrictioa  of  Range  and  Criterion  Unreliability 

Although  the  incremental  validity  estimates  d  or  d*  may  be  of  interest 
in  a  validity  study  at  a  single  site,  they  may  not  be  directly  comparable 
across  sites.  The  reason  is  that  these  indices  of  incremental  validity  in  a 
site  are  attenuated  by  range  restriction  and  the  unreliability  of  the 
criterion  variable.  Since  range  restriction  and  criterion  reliability  are 
artifacts  of  the  design  of  the  validity  study,  one  could  say  that  correction 
of  estimates  for  artifacts  is  necessary  before  pooling  the  estimates, 
following  the  validity-generalization  tradition  (Schmidt  &  Hunter,  1977). 

An  alternative  characterization  is  to  say  that  the  estimates  d  and  d* 
from  validity  studies  at  different  sites  are  actually  estimating  different 
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quantities.  For  example,  in  study  1  estimates  the  population  value  6^  of 
the  difference  between  two  (multiple)  correlations  of  test  batteries  with  an 
unreliable  criterion  in  a  restricted  population  of  test  scores.  The  value  d2 
in  study  2  estimates  the  population  value  62  which  is  the  difference  between 
multiple  correlations  of  test  scores  with  a  different  criterion  (and  hence 
criterion  reliability)  than  that  of  study  1  in  a  different  restricted 
population  than  that  of  study  1.  The  estimates  d^  from  study  1  and  d2  from 
study  2  estimate  conceptually  different  parameters  6^  and  62.  That  is,  the 
parameters  and  £2  arise  as  descriptions  of  different  populations. 

It  does  not  make  sense  to  pool  estimates  of  different  parameters.  Hence 
we  would  not  pool  d^  and  d2  directly.  Instead  we  specify  a  single  parameter 
that  might  be  estimated  from  each  study.  Perhaps  the  simplest  parameter  to 
estimate  from  each  study  is  the  validity  increment  that  would  be  obtained  in 
the  unrestricted  population  (e.g.,  the  total  applicant  pool)  if  the  criterion 
were  perfectly  reliable.  By  computing  an  estimate  of  this  quantity  in  each 
study,  all  studies  will  be  estimating  the  same  conceptual  parameter  and  hence 
pooling  across  studies  will  be  sensible.  If  such  a  quantity  cannot  be 
estimated  in  each  study,  an  alternative  to  pooling  is  the  use  of  nonparametric 
combined  significance  summaries,  as  described  above. 

Several  potential  approaches  to  correction  of  estimates  for 
unreliability  and  restriction  of  range  can  produce  the  desired  estimates. 

They  involve  a  combination  of  the  correction  for  attenuation  due  to 
measurement  unreliability  (e.g..  Lord  &  Novick,  1968,  p.  70)  with  a  correction 
for  attenuation  due  to  restriction  of  range.  Perhaps  the  most  elegant 
correction  for  the  effects  of  range  restriction  on  correlations  is  that  based 
on  the  multivariate  correction  of  the  covariance  matrix  given  by  Lawley 
(1943).  Unfortunately,  it  is  not  easy  to  derive  the  effects  of  this 
correction  on  the  variance  of  the  "corrected"  correlations  when  the 
covariances  that  enter  into  the  correction  are  themselves  uncertain.  However, 
this  correction  could  be  used  and  its  effect  treated  as  a  multiplicative 
constant.  While  this  would  effectively  ignore  the  uncertainty  introduced  into 
the  estimated  correlations  by  the  correction  for  range  restriction,  the 
effects  of  this  uncertainty  are  likely  to  be  relatively  small.  Moreover  the 
Lawley  correction  permits  the  operational  test  scales  that  are  the  actual 
basis  of  the  selection  to  be  treated  as  explicit  selection  variables  while  the 
criterion  and  test  scales  not  involved  in  determining  selection  are  treated  as 
incidental  variables  whose  range  is  affected  by  selection  on  the  other 
variables. 

Two  other  alternatives  are  less  satisfying.  One  is  to  estimate  range 
restriction  on  the  criterion  (outcome)  variable  and  to  correct  the 
correlations  via  the  univariate  (Spearman)  approach  (e.g..  Lord  &  Novick, 

1968,  p.  145).  This  approach  yields  results  that  are  mathematically 
equivalent  to  those  from  the  Lawley  correction,  but  it  requires  knowledge  of 
the  criterion  variance  in  both  the  restricted  and  unrestricted  populations. 
This  is  usually  unrealistic.  A  second  alternative  is  to  estimate,  for  each 
multiple  correlation,  the  linear  combination  of  predictors  that  has  the 
highest  correlation  with  the  criterion  (i.e.,  the  predicted  score  yielded  by 
the  regression  equation).  Compute  the  variance  of  this  composite  in  the 
restricted  and  in  the  unrestricted  populations,  estimate  its  restriction  of 
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range,  and  apply  the  Spearman  correction.  Since  this  alternative  involves  a 
considerable  amount  of  computation,  it  is  not  recommended. 


Let  and  R2  denote  the  sample  multiple  correlations  and  R2  after 
correction  for  restriction  of  range  via  the  Lawley  correction,  and  define  the 
relative  correction  of 
given  site)  as 


and 


and  R2  (which  we  treat  as  a  known  constant  for  a 

Cj^  = 

o 

C2  =  R2/R2* 


Let  and  P2  be  population  values  of  the  multiple  correlations  between  test 
batteries  1  and  2,  respectively,  and  the  true  score  on  the  criterion  in  the 
unrestricted  population.  Sample  estimates  R^^  and  R2  of  P^  and  P2, 
respectively,  are 


Ri  = 


=1^1 


/t/y  =  Ri  /'/y 


(6) 


Ro  = 


C2R2 


/i/  Y  = 


O 

R-i 


/;  Y 


(7) 


where  y  is  the  reliability  of  the  criterion,  which  is  assumed  to  be  known. 

The  corrected  correlations  R^  and  R2  can  be  used  to  construct  estimates  d  and 
d*  of  incremental  validities  6  =  $2  -  P^  and  6*  =  P2  -  Pj  in  the  unrestricted 
population  via 

A  A  A 

d  =  R2  ~  Rj 


and 


Then 


Pi  =  CiPj/^Y  , 


and  similarly, 

P2  =  C2P2/'^Y  • 


Correctiiig  Incremental  Validities  for  Shrinkage 

Sample  estimates  of  multiple  correlations  are  biased  estimators  of  the 
population  multiple  correlation.  The  bias  depends  on  the  sample  size  n  and 
the  number  2  of  predictor  variables.  The  bias  in  R^  as  an  estimate  of  P^  is 
approximately 


BIAS(r2)  *  E(r2) 


a(l-p2)  2(n-a-l)p2(l-p2) 

n-1  n^-l 
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where  the  approximation  is  obtained  by  ignoring  terms  proportional  to  1/n^ 
(see,  e.g.,  Johnson  &  Kotz,  1970,  p.  244).  Because  estimates  of  incremental 
validity  are  differences  between  multiple  correlations,  we  are  interested  in 
the  bias  of  estimates  of  the  differences.  Given  computed  with  a  predictors 
and  computed  with  b  predictors,  and  approximating  the  true  squared 
correlations  Pj^  and  P2  as  P^  =  (P^  +  ^\)I2  for  the  purposes  of  computing  a 
qualitative  estimate  of  bias. 


.  ,  (b-a)(l-p2) 

BIAS(R5-R5)  *  _ 

n-1 


If  P^  =  .4,  values  of  b-a  of  4,  5,  and  9  imply  bias  in  incremental  validity 
estimates  of  approximately  .0048,  .0060,  and  .0108,  respectively,  for  a  study 
with  n  s  500,  and  bias  of  .0024,  .0030,  and  .0054,  respectively,  for  a  study 
with  n  =  1000.  While  these  biases  are  not  large  in  absolute  terms,  they  may 
not  be  negligible  in  terms  of  the  incremental  validities  of  interest.  This  is 
particularly  true  for  sample  sizes  of  less  than  1000. 

If  sample  sizes  of  incremental  validity  studies  are  less  than  a  few 
thousand,  then  a  correction  for  bias  is  desirable.  The  correction  for 
shrinkage  given  by  Wherry  (1931)  or  the  more  complex  correction  given  by  Olkin 
and  Pratt  (1958)  could  be  applied.  Because  there  is  very  little  difference 
between  the  effects  of  these  two  corrections,  the  simpler  correction  by  Wherry 
may  be  preferable  in  practice.  Olkin  and  Pratt  note  that,  because  their 
correction  is  proportional  to  1/n,  it  has  no  effect  on  the  large  sample 
distribution  of  the  multiple  correlation.  This  is  also  true  of  Wherry's 
correction.  Consequently  the  large  sample  variances  given  here  also  apply  to 
estimates  corrected  for  shrinkage  by  either  of  these  two  methods. 

The  Statistical  Properties  of  Inoramental  Validities 

The  incremental  validity  estimates  d,  d*,  d,  and  d*  are  influenced  by 
sampling  variation.  Their  exact  sampling  distributions  are  not  known,  but 
large  sample  approximations  have  been  derived  which  are  quite  accurate  when 
the  sample  sizes  are  several  hundred  or  larger.  It  can  be  shown  that  in  large 
samples  (when  the  predictor  sets  are  disjoint  or  the  samples  are  independent), 
validity  increments  d,  d*,  d,  and  d*  have  normal  distributions  with  means  at 
the  true  incremental  validities  (5,  5*,  S,  and  5*,  respectively)  and  variances 
that  can  be  calculated  (estimated)  from  the  matrices  of  correlations  among 
predictors  and  criterion.^  The  complexity  of  the  expression  for  the  variance 
of  the  incremental  validity  depends  upon  whether  the  two  multiple  correlations 
that  are  used  to  compute  the  index  are  based  on  different  samples  and  thus  can 
be  treated  as  independent. 


^  This  holds  only  if  £  >  0  or  if  the  samples  are  independent  or  if  the 
predictor  sets  are  disjoint.  See  Appendix  B  for  an  alternative  approach  when 
the  second  predictor  set  includes  the  first. 
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Inda  d  for  R|  and  R,  Computed  from  Independent  Samples 

Let  be  the  size  of  the  sample  used  to  compute  and  let  02  be  the 
size  of  the  sample  used  to  compute  R2>  In  large  samples  d  has  a  mean  of 
approximately  5  and  a  variance  of  approximately 

(1  -  R^)2  (1  -  r2)2 

oS(d)  *  _  +  _  . 

ni  n2 

Inda  d*  for  R|  and  R,  Computed  from  Indqiendent  Sample 

Let  nj^  and  n2  be  the  sample  sizes  used  to  compute  R^  and  R2 
respectively.  Then  in  large  samples,  d  has  a  mean  of  approximately  6*  and  a 
variance  of  approximately 

4R^(1  -  R^)2  4R^(1  -  R^)^ 

oi,{d*)  ^  _  +  _  . 

ni  n2 

Inda  d  for  R|  and  R,  Computed  from  the  Same  Sample 

Let  n  be  the  sample  size.  Because  R^  and  R2  are  computed  from  the  same 
sample,  they  are  stochastically  dependent  and  hence 

Var(d)  =  Var(R2  -  R^)  =  Var(R2)  +  Var{R]^)  -  2Cov(Rj,  R2)» 

Hence  in  large  samples  d,  has  a  mean  of  approximately  5  and  a  variance  of 
approximately 

(1  -  (1  -  R^)^  2Cov„(R^,  R2) 

olid)  ,  _  +  _  _  _ _  ,  (8) 

n  n  n 

The  computation  of  Covq,(R^,  R2)  from  the  matrix  of  test  and  criterion 
correlations  is  described  starting  on  page  28  (Result  2). 

Inda  d*  for  R,  and  R,  Computed  from  the  Same  Sample 

Let  n  be  the  sample  size.  Because  R^  and  R2  are  computed  from  the  same 
sample,  they  are  stochastically  dependent  and  hence 

Var(d*)  =  Var(R2  -  Rj)  =  Var(R2)  Var(R2)  -  2Cov(R2,  R^)* 

Hence  in  large  samples,  d*  has  a  mean  of  approximately  5*  and  a  variance  of 
approx imate ly 

4Rf(l  -  r5)2  4R^(1  -  R^)2  2Cov*(r5,  R^) 

Si(d*)  .  _  +  _  _  _  ,  (9) 

n  n  n 
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The  computation  of  cov^CRj,  Rj)  from  the  matrix  of  test  and  criterion 
correlations  is  described  starting  on  page  27  (Result  2). 

luda  d  for  R,  and  Ri  Computed  from  Independent  Samples 

Let  n^^  be  the  size  of  the  sample  used  to  compute  Rj^  and  let  n2  be  the 
size  of  the  sample  used  to  compute  R2«  In  large  samples  d  has  a  mean  of 
approximately  S  and  a  variance  of  approximately 


A  O  A 

oS(d)  a 


c^(l  -  R^)2 


niY 


1(1  - 

n2Y 


(10) 


where  c^  and  C2  are  correction  factors  for  restriction  of  range  and  y  i-B  the 
reliability  of  the  criterion. 


Inda  d*  for  R(  and  R,  Computed  from  Independent  Samples 

Let  n^^  and  n2  be  the  sample  sizes  used  to  compute  R^  and  R2f 
respectively.  Then  in  large  samples  d*  has  a  mean  of  approximately  £*  and  a 
variance  of  approximately 


oiid  )  = 


4cf  Rf(l 


-  R?)2 


4c 


3  R^(l  -  R^)2 


niY' 


"2Y' 


(11) 


where  c^^  and  C2  are  correction  factors  for  restriction  of  range  and  y  1b  the 
reliability  of  the  criterion. 

Indn  d  for  R,  and  Rj  Computed  from  the  Same  Sample 

Let  n  be  the  Scunple  size.  Because  R^  and  R2  are  computed  from  the  same 
sample,  they  are  stochastically  dependent  and  hence 

Var(d)  =  Var(R^  ”  ^l)  ~  Var(R2)  +  Var(R2^)  -  2Cov(R2,  R2)- 
Hence  in  large  samples,  d  has  a  mean  of  approximately  5  and  a  variance  of 
approximately 


cf(l  -  Rj)^  c|(l  -  r|)2  2c2C2CoVa,(Ri,  R2) 

o^(d)  *  _  +  _  _  _  ,  (12) 

ny  ny  ny 

where  c^  and  C2  are  correction  factors  for  restriction  of  range  and  y  is  the 
reliability  of  the  criterion.  The  computation  of  Cov„(R2,  R2)  from  the  matrix 
of  test  and  criterion  correlations  is  described  starting  on  page  28  (Result 
2). 

Indo  d*  for  R,  and  R,  Computed  from  the  Same  Sample 

Let  n  be  the  sample  size.  Because  R^  and  R2  are  computed  from  the  same 
sample,  they  are  stochastically  dependent  and  hence 
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Var(d*)  =  Var(R^ 


Hence  in  large  samples,  d 
approximately 


A  O  A  A 

o£(d*) 


4  cj  Rf(l  -  Rf)2 


ny- 


-  R^)  =  Var(R^)  +  Var(R^)  -  2Cov(R^,  R^). 

has  a  mean  of  approximately  5*  and  a  variance  of 


4  c^  r|(1  -  R^)2 


ny 


T 


2ciC2Cov„(R^,  r|) 


ny- 


(13) 


where  c^  and  C2  are  correction  factors  for  restriction  of  range  and  y  is  the 
reliability  of  the  criterion.  The  computation  of  CoVgo(R^,  R^)  from  the  matrix 
of  test  and  criterion  correlations  is  described  starting  on  page  27  (Result 
2). 

Combining  Esdnutes  of  Inoramentiil  Validity 

Statistical  methods  for  pooling  results  of  incremental  validity  studies 
are  quite  similar  regardless  of  the  indexes  used  to  represent  incremental 
validity.  All  are  based  on  statistical  theory  for  combining  asymptotically 
normal  independent  estimators  (see  Hedges,  1983).  They  are  described 
generically  in  this  section  so  that  they  can  be  applied  to  either  of  the 
indexes  (  d  or  d*)  previously  discussed. 

Suppose  that  there  are  k  independent  validity  studies,  each  of  which 
yields  an  estimate  T  of  incremental  validity  with  a  standard  error  S(T).  Here 
T  may  be  either  of  the  indexes  d  or  d*  described  previously.  Using  a 
subscript  to  denote  the  study  from  which  an  estimate  is  obtained,  T^  is  the 
estimated  incremental  validity  in  the  i^^  study  and  0^  is  the  corresponding 
incremental  validity  parameter.  Thus  the  data  from  k  studies  is  the  set  of 
estimates  T^^,  ...,  T)^  and  their  standard  errors  S(T2),  ...,  S(T]^). 

If  all  of  the  studies  provide  estimates  of  a  common  incremental  validity 
parameter — that  is,  if  =  0j^  =  0 — then  a  weighted  linear  combination 

of  T^^,  ...,  T)^  produces  the  most  precise  combined  estimate  (Hedges,  1983). 

(See  Appendix  B  for  an  alternative  approach  when  the  second  predictor  set 
includes  the  first.)  The  optimal  linear  combination  T.  involves  weighting 
each  Tj^  by  the  inverse  of  its  variance  S^(Tj^),  namely 

Wi  Ti 

T.  *  i=l  ,  (14) 

Wi 

i=l 

where  w^^  »  l/S^(Tj^).  When  each  T^^  is  based  on  a  large  sample,  then  T.  is 
approximately  normally  distributed  about  0  with  standard  error  o(T.) 
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T.  -  N(0,  o2{T.))»  (15) 

where 

o(T.)  =  (E*'  .  (16) 

i=l 

Thus  a  test  for  the  statistical  significance  of  the  incremental  validity 
uses  the  test  statistic 

Z  =  T.  /  o(T. ) .  (17) 

If  6  s  0,  then  Z  has  a  standard  normal  distribution.  If  Z  exceeds  the  100a 
percent  critical  value  of  the  standard  normal  distribution,  then  the 
incremental  validity  6  is  significantly  greater  than  zero  at  significance 
level  a.  For  example,  if  Z  >  1.64,  the  incremental  validity  is  significant  at 
the  a  *  .05  level  of  significance. 

A  100 (1-a)  percent  confidence  interval  for  the  incremental  validity  0  is 
given  by 

■  *a/2  o(T-)  £  0  S  T.  +  Zq/2  o(T- )  » 

where  2^/2  100a  percent  two-tailed  critical  value  of  the  standard 

normal  distribution.  For  example,  if  a  =  .05,  2^/2  ~  1*96  and  a  95  percent 
confidence  interval  of  0  is  given  by 

T.  -  1.96  o{T.)  <  0  <  T.  +  1.96  o(T. ) . 

Estimatiiig  the  Varianoe  Across  Studies  of  Incremental  Validities 

It  is  convenient  to  treat  the  incremental  validity  parameters  as  if  they 
were  relatively  constant  across  studies;  i.e.,  to  assume  that  0^  =  ***  =  Ojc* 
It  may  be  useful  to  test  this  assumption  by  computing  an  estimate  of  the 
variance  (component)  of  the  0j^'s  across  studies.  Formally  we  may  assume  that 
0j^,  ...,  0]^  are  a  sample  from  a  universe  of  possible  incremental  validities. 
This  is  consistent  with  the  notion  that  the  particular  schools  in  which 
validity  studies  are  conducted  are  a  sample  from  a  universe  of  possible 
schools,  each  with  its  own  incremental  validity  parameter. 

A  simple  estimate  of  the  variance  of  the  universe  of  0  values  is 


=  E'^  (T^  -  T)2  /  ()t-i)  -  E^  s2(Ti)/k.  (18) 

i=l  i=l 

Note  that  the  first  summation  is  just  the  usual  sample  estimate  of  the 
variance  of  Tj^,  ...,  Tj^  and  the  second  term  is  the  average  of  the  variances 
(squared  standard  errors)  of  the  T^.  Note  also  that  Equation  18  occasionally 
yields  negative  values,  which  are  truncated  to  zero. 

A  test  of  the  statistical  significance  of  o§  (that  is,  a  test  of  the 
hypothesis  Hq:  >>  O)  uses  the  statistic 
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H  =  rT.-  -  T. 

- — ^■'5 -  - 

1=1  S-^(T^) 

If  =  0,  then  H  has  approximately  a  chi-squared  distribution  with  (k-1) 
degrees  of  freedom.  Thus  if  the  computed  value  of  H  exceeds  the  100 (1-a) 
percentage  point  of  the  chi-square  distribution  with  k-1  degrees  of  freedom, 
is  significantly  greater  than  zero  at  significance  level  a. 

It  is  usually  simpler  to  compute  H  via  the  computational  formula 


H  =  wj^T? 
i=l 


i=l _ 

Wi 

i=l 


(19) 


where  w^  =  1/S^(T^).  This  formula  permits  computation  of  H,  as  well  as  T.  and 
o(T.),  from  the  sums  of  the  variables  w^^,  w^T^,  and  Wj^T?. 


Power  of  Pooled  Tests  for  Inoranental  Validity 

The  large  sample  distribution  given  in  Equation  15  can  be  used  along 
with  Equation  17  for  the  test  statistic  Z  and  Equation  16  for  o(T.)  to  obtain 
the  large  seunple  distribution  of  the  parametric  test  for  pooled  incremental 
validity.  This  yields 

Z  -  N{e/a(T.),  1). 

Hence  the  power  of  the  test  for  incremental  validity  at  significance  level  a 
based  on  the  pooled  estimate  is  the  probability  that  a  normal  random  variable 
with  mean  0/o(T.)  and  variance  1  exceeds  z^,  the  100a  percent  one-tailed 
critical  value  of  the  standard  normal  distribution.  Thus  the  power  is  given 
by 

1  -  -  8/o(T.)).  (20) 

Power  computations  can  be  made  from  Equation  20  whenever  the  expected  validity 
increment  0  is  known  and  the  standard  errors  necessary  to  compute  a(T.)  have 
already  been  calculated. 

Power  for  Rj  and  R,  Computed  from  the  Same  Sample 

Let  the  population  validities  for  the  two  test  batteries  be  and  P2, 
respectively.  Let  n^,  ...,  be  the  total  sample  sizes  in  the  validity 
studies.  Then,  under  the  assumption  stated  above,  the  population  value  of  the 
incremental  validity  in  each  study  is  0  =  P2  -  P^  and  the  sampling  variance  of 
the  estimate  =  R2  -  in  the  i^^  study  is 

s2(Ti^)  =  A/n^ 
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where  A  =  (1  -  +  (1  -  ~  2Cov(R2^,  ^2)'  value  of  A  is  essentially 

that  given  in  Equation  8,  but  with  known  values  of  and  P2-  Hence  the 
sampling  variance  o^(T.)  of  the  pooled  estimate  of  incremental  validity  is 

A 

o2(T.)  =  _  , 

N 

where  N  =  the  total  (pooled)  sample  size  across  all  k  studies. 

This  implies  that  the  power  of  the  test  for  pooled  incremental  validity  is 

f  (P2  -  Pi)»'N  1 

1  -  ♦  I  Za  -  _  I  •  (21) 

I  n  I 


Note  that  this  estimate  of  power  depends  only  on  the  significance  level, 
two  validities,  A,  and  the  total  sample  size  across  all  k  studies.  It  does 
not  depend  directly  on  the  number  of  predictor  variables  used  to  compute  or 
R2  but  is  influenced  by  them  through  the  covariance  of  Rj^  and  R2  used  to 
compute  A.  Equation  21  can  be  used  to  compute  power  values  for  a  given  level 
of  incremental  validity  whenever  A  can  be  computed.  When  the  same  sample  is 
used  to  compute  R^^  and  R2,  the  covariance  of  R^  and  R2  is  not  zero.  In  fact, 

this  covariance  is  usually  quite  large,  particularly  *.  the  incremental 

validity  is  small.  The  reason  for  this  is  that  th®  magnitudes  of  Rj  and  R2 
tend  to  be  correlated:  If  there  is  little  irxremental  validity,  samples  that 

tend  to  give  a  large  value  of  R^  also  give  a  large  value  of  R2, 

However,  the  magnitude  of  the  correlat..on  between  R^  and  R2  also  depends 
on  the  difference  in  the  numbers  of  predictors  in  models  1  and  2. 

Specifically,  the  correlation  (and  covariance)  generally  decrease  as  more 
variables  are  included  in  model  2.  For  example,  a  typical  value  of  the 
correlation  between  R^^  and  R2  for  four  added  variables  is  .93  when  =  *40 
and  P2  =  .45,  whereas  a  typical  intercorrelation  value  for  the  seune  population 
validities  would  be  .91  when  the  second  model  includes  nine  additional 
variables.  Even  these  seemingly  slight  differences  in  correlation  values 
correspond  to  differences  in  the  power  of  tests  for  incremental  validity. 

The  estimate  given  in  Equation  21  of  the  power  of  the  pooled  test  for 
incremental  validity  wan  used  to  compute  the  power  values  given  in  Tables  2 
through  4.  These  computations  show  that,  when  the  incremental  validity  is  .02 
and  the  total  sample  size  is  at  least  N  =  1500,  the  power  of  the  test  exceeds 
95  percent  when  the  a  =  .05  level  of  significance  is  used  and  85  percent  when 
a  >  .01  for  nine  added  variables.  When  only  four  variables  are  added.  Table  4 
shows  that  95  percent  power  is  achieved  with  less  than  500  subjects  when  a  = 
.05,  and  with  less  than  750  subjects  when  a  -  .01. 

Because  current  plans  for  validity  studies  include  sample  sizes 
substantially  la.-ger  than  the  minimum  necessary  for  power  of  95  percent  for 
tests  at  the  a  ==  .05  level  of  significance,  current  studies  should  have 
adequate  power  to  detect  pooled  incremental  validities  of  .02  or  even  smaller. 
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Table  2 


Power  of  the  Pboled  Test  for  Incremental  Validity 
for  Nine  Additional  Variables 

as  a  F^mction  of  the  Validity  Increment  and  Pooled  Sam|de  Size 
for  R|  and  R,  Computed  from  the  Same  Sample 
and  P|  =  .40 


Sianificance  Level  o  »  .05 

Sianificance  Level  a  ^  .01 

P2  -  Pi 

P2  -  Pi 

n  .05  .02 

.05  .02 

250 

0.72 

0.40 

0.45 

0.17 

500 

0.93 

0.62 

0.79 

0.36 

750 

0.99 

0.77 

0.93 

0.53 

1000 

1.00 

0.87 

0.98 

0.67 

1250 

1.00 

0.93 

1.00 

0.78 

1500 

1.00 

0.96 

1.00 

0.86 

1750 

1,00 

0.98 

1.00 

0.91 

2000 

1.00 

0.99 

1.00 

0.94 

2200 

1.00 

0.99 

1.00 

0.96 

2400 

1.00 

1.00 

1.00 

0.98 

2500 

1.00 

1.00 

1.00 

0.98 

3000 

1.00 

1.00 

1.00 

0.99 

3500 

1.00 

1.00 

1.00 

1.00 

4000 

1.00 

1.00 

1.00 

1.00 

5000 

1.00 

1.00 

1.00 

1.00 

6000 

•  1.00 

1.00 

1.00 

1.00 

7000 

1.00 

1.00 

1.00 

1.00 

8000 

1.00 

1.00 

1.00 

1.00 

9000 

1.00 

1.00 

1.00 

1.00 

10000 

1.00 

1.00 

1.00 

1.00 

Note:  Power  values 

listed  as  1.00  are  values  greater  than  .995. 
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Tables 


Power  of  the  Pooled  Test  for  Incremental  Validity 
for  Fire  Additional  Variables 

as  a  E^mction  of  the  Validity  Increment  and  Pooled  Sample  Sze 
for  R|  and  R,  Computed  from  the  Same  Sample 
and  P,  =  .40 


Significance  Level  a  =  .05 


Significance  Level  a  »  .01 


P2  -  Pi 


P2  -  Pi 


n 

.05 

CM 

o 

.05 

.02 

250 

0.79 

0.46 

0.55 

0.21 

500 

0.97 

0.70 

0.87 

0.44 

750 

1.00 

0.84 

0.97 

0.63 

1000 

1.00 

0.92 

1.00 

0.77 

1250 

1.00 

0.96 

1.00 

0.86 

1500 

1.00 

0.98 

1.00 

0.92 

1750 

1.00 

0.99 

1.00 

0.96 

2000 

1.00 

1.00 

1.00 

0.98 

2200 

1.00 

1.00 

1.00 

0.99 

2400 

1.00 

1.00 

1.00 

0.99 

2500 

1.00 

1.00 

1.00 

0.99 

3000 

1.00 

1.00 

1.00 

1.00 

3500 

1.00 

1.00 

1.00 

1.00 

4000 

1.00 

1.00 

1.00 

1.00 

5000 

1.00 

1.00 

1.00 

1.00 

6000 

1.00 

1.00 

1.00 

1.00 

7000 

1.00 

1.00 

1.00 

1.00 

8000 

1.00 

1.00 

1.00 

1.00 

9000 

1.00 

1.00 

1.00 

1.00 

10000 

1.00 

1.00 

1.00 

1.00 

Note:  Power  values  listed  as  1.00  are  values  greater  than  .995. 
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Table  4 


Power  of  the  Pooled  Test  for  Incremental  Validity 
for  Four  Additional  Variables 
as  a  F^mction  of  the  Validity  Incronent  and  Pooled  SampU 
for  Rf  and  R,  Computed  from  the  Same  Sample 
and  Pt  =  .40 


Sianificance  Level  a  =  .05 

P2  -  ?! 

P2  -  Pi 

n  .05  .02 

.05  .02 

250 

0.82 

0.48 

0.59 

0.23 

500 

0.97 

0.73 

0.90 

0.47 

750 

1.00 

0.87 

0.98 

0.66 

1000 

1.00 

0.94 

1.00 

0.80 

1250 

1.00 

0.97 

1.00 

0.89 

1500 

1.00 

0.99 

1.00 

0.94 

1750 

1.00 

0.99 

1.00 

0.97 

2000 

1.00 

1.00 

1.00 

0.98 

2200 

1.00 

1.00 

1.00 

0.99 

2400 

1.00 

1.00 

1.00 

1.00 

2500 

1.00 

1.00 

1.00 

1.00 

3000 

1.00 

1.00 

1.00 

1.00 

3500 

1.00 

1.00 

1.00 

1.00 

4000 

1.00 

1.00 

1.00 

1.00 

5000 

1.00 

1.00 

1.00 

1.00 

6000 

1.00 

1.00 

1.00 

1.00 

7000 

1.00 

1.00 

1.00 

1.00 

8000 

1.00 

1.00 

1.00 

1.00 

9000 

1.00 

1.00 

1.00 

1.00 

10000 

1.00 

1.00 

1.00 

1.00 

Note:  Power  values  listed  as  1.00  are  values  greater 

than  .995. 
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Power  for  R,  and  R,  Computed  from  Ind^iendent  Samples 

Let  the  population  validities  for  the  two  test  batteries  be  and  P2, 
respectively.  Let  n^r  •••>  total  sample  sizes  in  the  validity  studies 

and  assume  that  the  two  groups  within  each  study  are  of  equal  size.  The 
population  value  of  the  incremental  validity  is  6  =  P2  -  P^.  sampling 

variance  of  the  estimate  =  R2  -  Rj  in  the  i^*'  study  is 

s2(Ti)  =  2A/ni 

where  A  =  (1  -  P^)^  +  (i  ”  P^)^*  Note  that  the  covariance  term  is  omitted 
because  R^  and  R2  are  independent.  Therefore  the  numbers  of  predictors  in  the 
two  models  also  do  not  affect  the  power  of  the  test  for  incremental  validity  when 
independent  samples  are  used.  Thus  the  sampling  variance  o^(T.)  of  the  pooled 
estimate  of  incremental  validity  is 

o2(T. )  =  2A  , 

N 

where  N  =  n^  is  the  total  (pooled)  sample  size  across  all  k  studies.  This 

implies  that  the  power  of  the  test  for  pooled  incremental  validity  is 


1  -  *(z^  -  (Po  -  P^Wn  1  .  (22) 

^2l 

The  estimate  given  in  Equation  22  was  used  to  compute  the  power  values 
given  in  Table  5.  These  computations  show  that  the  power  of  the  pooled  test  for 
incremental  validity  is  substantially  less  when  the  studies  use  independent 
samples  to  compute  R^  and  R2  than  when  the  same  sample  is  used.  For  example,  if 
the  incremental  validity  is  .02  and  the  a  =  .05  level  of  significance  is  used, 
a  total  sample  size  of  N  =  45,000  is  needed  to  reach  a  power  of  80  percent.  If 
the  incremental  validity  is  .01,  the  power  does  not  attain  even  40  percent  power 
for  pooled  sample  sizes  as  large  as  N  =  50,000. 
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Tables 


Power  of  the  Pooled  Test  for  Inoemental  Valitfity 
as  a  FWictioa  of  the  Validity  Increment  and  Pooled  Sample  Size 
for  R,  and  R,  Computed  from  Independent  Samples 
and  P.  =  .40 


significance  Level  a  =»  .05  Significance  Level  < 


P2-P1 


P2-P1 


n 

.05 

.02 

.05 

.02 

1,000 

.25 

.10 

.09 

.03 

2,000 

.39 

.13 

.17 

.04 

3,000 

.51 

.16 

.26 

.05 

4,000 

.61 

.19 

.34 

.06 

5,000 

.70 

.21 

.43 

.07 

6,000 

.76 

.24 

.51 

.08 

7,000 

.82 

.26 

.59 

.09 

8,000 

.86 

.28 

.66 

.10 

9,000 

.89 

.31 

.71 

.12 

10,000 

.92 

.33 

.76 

.13 

15,000 

.98 

.43 

.92 

.20 

20,000 

1.00 

.52 

.98 

.26 

25,000 

1.00 

.60 

.99 

.33 

30,000 

1.00 

.67 

1.00 

.40 

35,000 

1.00 

.73 

1.00 

.47 

40,000 

1.00 

.78 

1.00 

.53 

45,000 

1.00 

.82 

1.00 

.59 

50,000 

1.00 

.85 

1.00 

.64 

Note:  Power  values  listed  as 

1.00 

are  values  greater  than 

.995 

23 


THEORETICAL  RESULTS 


In  this  section  we  derive  the  asymptotic  distributions  of  the  incremental 
validity  indexes  d  and  d  .  We  use  these  asymptotic  distributions  to  obtain 
large  sample  approximations  to  the  distributions  of  these  indices.  We  begin  by 
stating  a  fundamental  theorem.  Then  we  use  this  theorem  to  obtain  the  asymptotic 
joint  distribution  of  the  determinants  of  certain  correlation  matrices.  These 
distributions  are  then  used  to  obtain  the  asymptotic  joint  distributions  of 
and  R2,  and  R^  and  R2  which  yield  the  asymptotic  distributions  of  d  and  d*. 

A  Fundamaital  Theorem 

Throughout  this  section  we  make  use  of  the  multivariate  delta  method, 
which  follows  from  the  fundamental  theorem  given  below.  This  theorem  is  a 
straightforward  generalization  of  Theorem  4.2.5  in  Anderson  (1958),  as  given,  for 
example,  in  Olkin  and  Siotani  (1976). 

Theorem;  Let  u(n)  =  (u^(n),  ...,  u^(n))  be  a  vector  of  random  variables 
such  that  the  limit  in  probability  as  n  -»  m  of  u(n)  =  b  =  (bj^,  ...»  b^)  and  ^n 
(u(n)  -  b)  is  asymptotically  normally  distributed  with  mean  vector  O  and 

covariance  matrix  f.  If  y(n)  =  (yi(n),  ...,  y)j(n))  =  ...,  f  j^) ,  k  s  m,  where 

the  fj^  =  fj^(u(n))  are  functions  of  u(n)  having  first  and  second  derivatives  in 
a  neighborhood  of  u{n)  =  b,  then  the  asymptotic  distribution  •In  [y(n)  -  f(b))  is 
given  by 

^n  (y(n)  -  f(b)l  -  N(0,  AfA' ) , 
where  A  is  a  k  x  m  matrix  with  elements 

*ij  *  (5fi(u)/  5uj(n))  I  u  =  b  and  f(b)  =  (fi(b),  ...,  f)j(b)). 

In  this  paper,  the  vector  u(n)  is  composed  of  correlation  coefficients 
(i.e.,  u  is  the  correlation  matrix  among  tests  and  criteria,  arranged  as  a 
vector).  Because  correlation  coefficients  of  multivariate  normal  variates  are 
functions  of  sample  momenta,  they  have  an  asymptotic  joint  distribution  that  is 
multivariate  normal  with  a  covariance  matrix  T  which  is  a  function  of  the 
population  values  of  the  correlations  (Pearson  &  Filon,  1898).  Thus  the  theorem 
gives  a  method  for  computing  the  asymptotic  joint  distribution  of  pairs  of 
multiple  correlations,  or  of  any  smooth  functions  of  pairs  of  multiple 
correlations,  such  as  the  indexes  of  incremental  validity  considered  here. 

The  Samplinf;  Distributioa  of  locrancnlal  Validities 

In  this  section  we  apply  the  fundeunental  theorem  to  obtain  the  sampling 
distribution  of  incremental  validity  indices  in  large  samples.  We  do  so  in  three 
steps.  First,  we  obtain  the  asymptotic  joint  distribution  of  the  determinants 
of  correlation  matrices.  We  use  the  joint  distribution  of  the  determinants  to 
obtain  the  asymptotic  joint  distribution  of  two  multiple  correlations  and  that 
of  two  squared  multiple  correlations.  Finally,  we  use  the  joint  distribution  of 
two  multiple  correlations  to  obtain  the  asymptotic  distribution  of  the 
incremental  validity  indices. 
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Notation 


Let  Xq,  X^,  X2/  •..«  X^  be  a  collection  of  random  variables  with  a  joint 
multivariate  normal  distribution,  where  Xq  represents  the  criterion  variable  and 
X^,  X^  represent  predictor  variables.  We  denote  the  saunple  and  population 
correlations  between  Xj^  and  Xj  by  r^^j  and  p^j  respectively.  We  denote  a  matrix 
of  correlations  by  defining  the  set  of  variables  to  be  correlated.  Specifically, 
for  k  2  1,  let  a^,  ...,  denote  distinct  subsets  of  the  set  of  integers  {0,  1, 
...,  m}.  Then  each  defines  a  set  of  variables — the  set  of  variables  whose 
subscripts  are  contained  in  a^.  We  use  the  notation  R(a^),  ...,  to  denote 
the  square  matrices  of  correlations  of  variables  implied  by  the  sets  a^r  •••>  ci]^> 
We  will  also  use  the  notation  R(0,  a^)  to  denote  the  matrix  of  correlations  of 
Xq  and  the  variables  implied  by  instead  of  the  more  formal  R({0},  a^) . 

Result  1:  Joint  Distributioo  of  Detenninaots  of  Correlslion  Sub-matrices 

Let  Xq,  Xj^,  ...,  X^  be  random  variables  (representing  a  criterion  and  test 
scales)  that  have  a  joint  multivariate  normal  distribution.  Let  a^,  ...,  be 
nonempty  sets  of  the  integers  between  0  and  m  inclusive,  denoting  collections  of 
the  m  subtests,  possibly  including  the  criterion.  Thus 

R(a^),  ...,  R(a)^)  are  the  sample  correlation  matrices  of  the  variables  implied 
by  ...,  ojj  respectively.  Then  the  asymptotic  joint  distribution  of  |R(a2^)|, 
...,  |R(ajj)|,  when  all  of  the  determinants  are  computed  from  correlations  based 
on  the  same  sample  of  size  n,  is  given  by 

Jn  ((|R(ai)|,  ...,  |R(a^)|)  -  (|P(ai)|,  ...,  |P(a^)|)]  -  N(0,  S) 
where  O  is  a  k  x  1  vector  of  zeros  and  S  is  given  by  (o^j  )  and 


Cij  =  E  E  E  E  4|P(ai)  I  |P(aj)  X 

seoj^  tea^  ucoj  vcoj 
s  <  t  u  <  V 

f 

I  Pst  Puv  ^Psu  Psv  Ptu  Ptv^/^  Psu  Ptv  Psv  Ptu 

1 

(Pst  Psu  Psv  Pst  Ptu  Ptv  Psu  Ptu  Puv  Psv  Ptv  Puv  )  I  ' 

where  the  sums  in  o^j  are  taken  so  that  s  <  t  and  u  <  v  and  ^be  element 

in  row  s  and  column  t  of  P”^(a^),  the  inverse  of  P(a£),  and  p^g  *  1. 

A  somewhat  simpler  form  of  the  asymptotic  variance  of  the  determinant 

of  a  correlation  matrix  was  derived  by  Olkin  and  Siotani  (1976).  We  use  a 
slightly  more  complex  expression  than  needed  for  the  variance  terms  to  define  all 
elements  of  £  because  it  generalizes  more  easily  to  the  situation  in  which  some 
correlations  are  known.  The  covariance  terms  are  obtained  by  applying  the 
multivariate  delta  method  to  the  joint  distribution  of  the  correlation  matrix. 
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Apply  the  method  by  writing  the  nonredundant  elements  of  the  correlation  matrix 
as  a  vector  r  =  (rQi  ,  ro2  »  •••,  /  r^2  '  '^Im  ' 

^23  »  •••»  ^(m-l)m  )*•  Then  if  |R(a^)  |  =  f^(r),  the  asymptotic  covariance 

matrix  is  computed  from  the  partial  derivatives  of  the  fj^  with  respect  to  the 
elements  of  r.  Since  R(o^)  is  symmetric, 

_2iRIailJ_  =  2  r^^j 

Hence 

Oij  *  4  E  E  jPCa^) I (P(aj) I  Cov(rgt  ,  ru^),  (23) 

8<t ,  U<V 

where  Cov(rg^,  r^^  )  is  the  covariance  of  and  r^^  from  the  asymptotic  matrix 
of  r,  and  s,t  e  and  u,v  e  o j .  Substituting  the  expression  given,  for  example, 
by  Pearson  and  Filon  (1898)  for  Cov(rg^  ,  r^^  )  yields  the  result  given  in  the 
theorem. 

Equation  23  also  can  be  written  as 

=  E  E  E  E  IP(ai^)  I  |p{aj)  I  P(j)“''  Cov(rgt  •  )  , 

a  t  u  V 

where  no  index  is  restricted  to  be  less  than  another. 

Using  the  notation  Pij,k  =  Pij  “  Pik  Pjk  ' 

COV(rgt»  tuy  )  *  P8U,tPtV,U  PsV,uPtU,8  P8U,vPtV,8  P8V,tPtU,V 
By  symmetry, 

.  Oij  =  2  E  E  E  E  IP(ai)  I  IP(aj)  I  P(i)®^  P(j)“''  P8u,t  Ptv,u  *  <24) 

8  t  U  V 


When  Some  Correlations  Known.  When  some  of  the  elements  of  the  R(aj^)  are 
known,  the  formal  computation  of  the  asymptotic  distribution  remains  the  same  as 
given  above  except  that  each  term  of  the  sum  involving  a  known  tg^  vanishes. 

When  All  Correlations  Estimated.  If  all  of  the  correlations  are  estimated, 
Oj^j  can  be  expressed  in  the  form  of  simple  matrix  multiplications,  as  shown  by 
expanding  Equation  24  and  carrying  out  the  indicated  multiplications  by  the 
inverse  elements,  as  follows: 

Oij  -  2|P(ai)  I  |P(aj)  I  E  E  [(  P(i)®'^  Psu)  ■  PtuH(  E  Ptv)  -  PtuJ- 

t  U  S  V 

Let  P^j  «  the  correlation  matrix  between  the  variables  in  set  and  set  Oj,  with 
Pii  »  P(ai).  Let  A  =  (P”ii  -  I) Pij  and  B  =  Pj.j(P"Jj  -  I)  .  Then 
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(25) 


o^j  =  2|P(ai) I |p(aj) |tr(AB')  =  2 1 P(ai) | | P(aj ) | A.B  , 

where  the  dot  product  operator  •  denotes  the  sum  of  the  products  of  the 
corresponding  elements  in  the  two  matrices. 


Result  2:  Populadoa  CoTariances  of  Multiple  Correlatioas  Estimated  fnan  the  Same  Sample 


Let  Xq,  ^  random  variables  with  a  joint  multivariate  normal 

distribution.  Let  be  the  sample  multiple  correlation  of  a  subset  of  variables 
identified  by  the  set  of  indexes  with  Xq  and  let  R2  be  the  sample  multiple 
correlation  of  another  subset  of  variables  identified  by  the  set  02  of  indexes 
with  Xq,  where  both  correlations  are  computed  from  the  same  sample  of  size  n. 
Let  P^  and  P2  be  the  population  multiple  correlations  corresponding  to  and  R2 
respectively.  Then  the  asymptotic  joint  distribution  of  Rj  and  Rg  is  given  by 

Jn  [(R^,  R^)  -  (Pf,  p|)]  -  N{0,  r), 

where  Z  -  (Oj^j  ), 

®11  =  - £1 -  +  _ C2iPjQ,  ailjj - - 2c3|P(0,a2lJ_  ' 

|P(ai)P  1p(“i)|^  |P(Ol)|^ 


Csii^Q^.  a2lji _ 2c5JPl0j_a2lJ_  , 

|P(a2)|^  |P(a2) 


®12 


_ C 

|P(ai)||P{a2)| 


I 

C  =  I  C7  -  _C3_J.P(0,  ailj_  _  _C9jPlO^_a2lJ_ 

1  |P(ai)|  |P(“2)| 

1 

+  — CiQ  |P(0.  03)  I  |P(0.  go)  I I  » 

|P(ai)||P{a2)|  J 
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Cj  -  Var  (|R(0,  aj) I), 

C2  -  Var  (iR(ai) j), 

C3  »  Cov(|R(a2)|,  |R{0,  ai)|), 

C4  -  Var  ( |R(0,  02) I ), 

C5  «  Var  ( |R{a2) | ) , 

and  the  covariances  c^/  •••> 


cg  =  Cov(|R(a2)|,  |R(0,  02)1), 

c-j  -  Cov  {1r(0,  a^)!,  |R(0,  02)1), 

Cg  =  Cov  (|R(oi)|,  |R(0,  02)1), 

C9  =  Cov  (|R(0,  0^)1,  |r(02)|), 

Cio  =  Cov  (|r(03)|,  |r(02)|), 

I  given  in  Result  1. 


The  asymptotic  joint  distribution  of  and  R2 
t/n  ((Rj,  Rj)  -  (Pi,  P2)]  -  N(0, 


where  ?  =  (?lj  ), 


is  given  by 

T), 


T 


11 


- £1 -  + 

4|P(Oi) [2  p2 


_ C2 |P<0.  Oi> 1^ 

4lP(oi) |4  p2 


-CglPfO,  OilJ_  , 

2|P(ai)P  p! 


^22 


_S3- 


4iP(02)  !• 


— C4.1.P,(Q.._.32lii 
4jP(02) P§ 


2|p(02)I^  Pi 


▼12  =  - s - 

4lP(Oi)||P(Q2)|PiP2 


and  c^,  ...,  Cg  and  C  are  given  above. 

The  result  is  proven  by  writing  the  multiple  correlations  as  functions  of 
the  determinants  of  correlation  matrices 

_ SiiJ _  > 

|R(ai)| 

lR(a2)T 

Since  R^  and  R2  are  functions  of  |r(oi)|,  |R(0,  ai)|,  |R(a2)|,  and  |R(0,  02)|, 
the  joint  distributions  of  (R^,  R2 )  and  (R^,  R^)  are  derived  by  applying  the 
delta  method  using  Result  1. 
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Note  that  ,  and  hence  the  correlation 

p ( »  R2 )  =  P ( »  ^2  ^  • 

We  use  a  more  complex  expression  for  the  asymptotic  variances  and 
than  is  strictly  necessary  because  the  expression  given  above  generalizes  more 
easily  to  the  case  when  some  of  the  bivariate  correlations  are  known.  If  none 
of  the  correlations  are  known  then 

°ii  =  4P?  (1  -  P?)2  and  =  (1  -  P?)2. 

Result  3:  Papulatioo  Variances  of  Multiple  Correlalioa  DifTerenoes  Estinuitrd  from  the  Same  Sample 

Let  Xq,  ^  random  variables  with  a  joint  multivariate  normal 

distribution.  Let  be  a  nonempty  subset  of  the  set  of  integers  {1,  2,  ...,  m} 
defining  (via  the  subscripts)  a  subset  of  the  collection  of  variables.  Let  02 
be  a  distinct  nonempty  subset  of  (1,  2,  ...,  m}  defining  another  subset  of  the 
variables  X^,  ...,  X^.  Let  R^^  be  the  sample  multiple  correlation  of  Xg  with  the 
variables  defined  by  and  let  R2  be  the  sample  multiple  correlation  of  Xg  with 
the  variables  defined  by  02-  Let  P^  and  P2  be  the  population  correlations 
corresponding  to  Rj^  and  R2.  Then  the  asymptotic  distribution  of  d  =  R2  -  R^  is 
given  by 

«/n  (d  -  5)  -  N(0,  al)  , 

where  6  =  P2  -  P^* 

°d  =  ’ll  ’22  ■  27^2 

and  the  T^j  are  given  in  Result  2.  The  asymptotic  distribution  of  d  =  Rg  -  Rj 
is  given  by 

Jn  (d*  -  5*  )  -  N(0,  o^*), 

where  6*  =  P^  -  P^, 

~  ®11  ^22  ~  ^®12  (26) 
and  the  o^j  are  given  in  Result  2. 

This  result  is  obtained  directly  by  applying  the  delta  method  using  the 
asymptotic  distribution  given  in  Result  2. 

Result  4:  Population  Variances  of  Multiple  Correlation  Differences  Estimated  from  Independent 
Samples 

Let  Xg,  X]^,  .../  ^  random  variables  with  a  joint  multivariate  normal 
distribution  and  let  and  02  be  distinct  subsets  of  the  integers  {1,  ...,  m} 
such  that  each  defines  a  distinct  set  of  the  variables  X^^,  Let  R^  be 

the  multiple  correlation  of  Xg  with  the  variables  defined  in  computed  from  a 
sample  of  size  n^^  and  let  R2  be  the  multiple  correlation  of  Xg  with  the  variables 
defined  by  02  computed  from  an  independent  sample  of  size  n2.  Let  Pj^  and  P2  be 
the  population  multiple  correlations  corresponding  to  Rj^  and  R2.  Then  if  n  >  n^^ 
+  n2  and  =  n^/n  and  jr2  “  ^2^^  remain  fixed  as  n  -•  <»,  the  asymptotic 

distributions  of  d  =  R2  -  Rj  and  d*  =  R2  -Rj  are  given  by 

(/n  (d  -  6) 
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N(0,  al) 


and 

'In  (d*  -  6*)  -  N{0,  a^*) 

where  5  »  P2  -  6*  =  P^  -  P^,  and 


Od 


(1  -  P^)2 


(1  -  P^)2 


oh  = 


4pf(l  -  Pf)2 


4P^(1 


p2v2 

P2) 


TT, 


This  result  follows  directly  from  the  asymptotic  distributions  of 
-  P^)  and  yn^(R?  -  P?)  and  the  statistical  independence  of  and  R2. 

Usiiig  the  Theoretical  Results  with  Estimated  Variances 

Results  3  and  4  give  the  asymptotic  distributions  of  incremental  validities 
in  which  the  asymptotic  variance  is  a  function  of  the  matrix  of  population 
correlations  among  variables.  Thus  this  distribution  theory  is  of  little  use 
when  (as  in  any  real  application)  the  entire  matrix  of  population  correlations 
is  not  known.  To  use  these  results,  it  is  necessary  to  show  that  estimating  the 
asymptotic  variances  from  sample  correlations  still  yields  a  valid  asymptotic 
distribution. 


Result  5:  Sample  Variances  of  Multiple  Correiatioa  Differences  Estimated  from  the  Same  Sample 

Suppose  that  the  conditions  stated  in  Result  3  obtain.  Define  and 
as  the  estimates  of  o^  and  that  would  be  computed  by  using  the  corresponding 
sample  correlation  coefficients  in  place  of  the  population  correlations.  Hence 
and  are  random  variables  depending  on  the  sample  correlations.  Then  the 
following  asymptotic  distributions  hold  as  n  -»  »  : 

In  (d  -  6)  /  Sd  -  N(0,  1), 

and 

In  (d*  -  5*)  /  Sjj*  -  N(0,1). 

These  asymptotic  distributions  imply  that,  in  large  samples, 

d  -  N(5,  a\l  n) 

and 

d*  -  N{6*,  ah/  n)* 
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Result  (:  Sample  Variances  of  Multiple  Correlatioa  Differences  Estimated  from  Independent  Samples 

Suppose  that  the  conditions  of  Result  4  obtain.  Define  and  as  the 
estimates  of  and  obtained  by  substituting  the  stochastically  independent 
sample  multiple  correlations  R^^  and  R2  for  the  population  multiple  correlations 
and  P2  respectively.  Then  the  following  asymptotic  distributions  hold  as 
n  -»  00 

✓n  {d  -  5)  /  -  N(0,  1) 

and 

(d*  -  5*)  /  Sd*  -  N(0,  1). 

These  asymptotic  distributions  imply  that,  in  large  samples, 

d  -  N(5,  /  n) 

and 

d*  -  N(5*,  o^*  /  n) . 

Results  5  and  6  follow  from  Results  3  and  4  by  noting  that  the  sample 
correlation  matrix  R  converges  in  probability  to  the  population  correlation 
matrix  P  and  and  are  all  continuous  functions  of  the  elements  of  P  (see, 
e.g.,  Rao,  1973,  p.  385,  Theorem  6a.2(i)  ). 

Results  When  Some  Correlations  Are  Known 

In  some  situations,  some  of  the  correlations  will  be  known  with  a  very  high 
degree  of  precision.  For  example,  if  a  test  battery  has  been  widely  used  for 
some  extended  period,  the  correlations  among  tests  in  the  battery  may  be 
essentially  known.  That  is,  for  some  we  may  know  the  value  of  the 

corresponding  population  correlation  Pj_j-  In  such  cases,  it  is  desirable  to 
increase  the  precision  of  estimates  of  incremental  validity  by  utilizing  the  fact 
that  some  of  the  correlations  are  known. 

We  compute  estimates  of  multiple  correlations  and  incremental  validity  when 
some  correlations  are  known  by  substituting  the  values  of  the  known  correlations 
for  their  sample  estimates.  This  procedure  yields  consistent  estimates  of  the 
multiple  correlations  under  the  model  with  some  known  correlations,  but  the 
estimates  so  derived  are  not  the  maximum  likelihood  estimates  (see  Olkin  S 
Sylvan,  1977).  One  explanation  is  that  the  maximum  likelihood  estimates  (MLEs) 
of  joint  covariance  matrices  are  rather  complex  when  some  correlations  are  known, 
which  in  turn  yield  rather  complicated  (or  intractable)  expressions  for  the  MLEs 
of  the  multiple  correlations.  The  strategy  suggested  here  has  the  advantages 
that  it  produces  consistent  estimates  with  reduced  variance  when  some 
correlations  are  known  (compared  to  the  situation  when  all  correlations  must  be 
estimated),  it  is  quite  flexible  as  to  patterns  of  known  correlations  that  can 
be  handled,  and  it  can  be  further  generalized  to  cases  where  data  from  an 
independent  sample  are  pooled  together  to  strengthen  estimates. 

Results  1,  2,  3,  and  4  generalize  directly  to  the  case  where  the 

correlations  are  known.  In  the  case  where  all  correlations  were  estimated,  we 
derived  the  asymptotic  distributions  of  functions  (e.g.,  determinants  and 
multiple  correlations)  from  those  estimated  correlations.  When  some  correlations 
are  known  we  consider  functions  of  both  the  estimated  correlations  and  the  known 
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correlations.  The  key  to  the  generalization  of  results  is  the  recognition  that, 
since  a  known  correlation  is  a  fixed  constant,  its  variance  and  covariance 
with  any  other  quantity  must  be  zero.  Also  any  function  of  all  fixed  arguments 
must  also  be  a  fixed  constant.  Using  this  idea,  the  generalization  of  Result  1 
is  given  below  as  Result  7. 

Result  7:  Generalization  of  Result  1  where  Some  Correlatioos  Known 

Let  Xq,  Xj,  ...,  X^  be  random  variables  (representing  criterion  and  test 
scales)  that  have  a  joint  multivariate  normal  distribution.  Let  a^,  ...,  “jc 
nonempty  sets  of  0  through  m  inclusive,  denoting  collections  of  the  m  subtests. 
Let  R(a2),  ...,  R(ajj)  be  the  correlation  matrices  of  the  variables  implied  by  a^, 
...»  ajj  respectively,  where  at  least  one  of  the  elements  of  each  R(aj^)  is  a 
sample  correlation  and  the  others  are  population  correlations.  For  each  i*l, 
...,  k  define  a  status-indicator  matrix  with  the  same  dimensions  as  R(a£), 

but  where  the  elements  of  K{a£)  are  defined  as  0  or  1  depending  upon 

whether  the  corresponding  element  r^^jg^  of  R{a^)  is  known.  Specifically, 

f  °  ''(i)st  =  P(i)st 

^(i)st 

I  1  if  r^^jg^  is  estimated. 

for  s,  Then  the  asymptotic  joint  distribution  of  jR{a2^)|,  ...,  |R(ajj)  | 

when  all  of  the  determinants  are  computed  from  correlations  based  on  the  same 
seunple  of  size  n  is  given  by 

'/n  (((R(ai)|,  ...,  |R(aj^)|)  -  (|P(ai)|,  ...,  |p(a,^)|))  -  N(0,  2) 

where  £  is  given  by  («ij)  and 

®ij  ^  ^  ^  |P(®i)  )  l^^(i)st  ’^{j)uv  P(i)°^  P(j)'*'^  * 

sea^  teoj^  ucoj  veoj 

s  <  t  u  <  V 


I 

I  Pst  Puv  ^Psu  ^  Psv  Ptu  *  Ptv^/^  Psu  Ptv  Psv  Ptu 


^Pst  Psu  Psv  Pst  Ptu  Ptv  Psu  Ptu  Puv  ^  Psv  Ptv  Puv 


1 

} 

) 


where  the  sums  in  o^j  are  taken  so  that  s  <  t  and  u  <  v,  and  is  the 

element  in  row  s  and  column  t  of  P”^  (a^),  the  inverse  of  P(aj_),  and  pgg  =  1. 


Result  8:  Generalization  of  Result  2  where  Some  Correlations  Known 


Let  Xq,  X]^,  ...,  Xjj,  be  random  variables  with  a  joint  nonsingular 

multivariate  normal  distribution.  Let  Rj  be  the  sample  estimate  of  the  multiple 
correlation  with  Xq  of  a  subset  of  variables  identified  by  the  set  of  indices 
and  let  R2  be  the  sample  estimate  of  the  multiple  correlation  with  Xq  of  a  subset 
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of  variables  identified  by  the  set  of  indices  02*  Some  (but  not  all)  of  the 
bivariate  correlations  may  be  known.  Thus  the  corresponding  population 
correlations  may  be  substituted  for  the  corresponding  sample  correlations  in  the 
computation  of  and  R2.  Whenever  sample  correlations  are  used  to  compute  Rj^ 
or  R2(  they  are  based  on  the  same  sample  of  size  n.  Let  the  status  indicator 
functions  Ha^)  and  L(0,  a^)  be  defined  so  that 


f 

0 

if 

all  elements 

of  R(aj^ 

)  are  known 

L(ai)  =  < 

1 

1 

if 

at  least  one 

element 

of  R(a£)  is  estimated 

and 

f 

0 

if 

all  elements 

of  R{0, 

a^)  are  known 

L(0,  a^)  =  { 

[ 

1 

if 

at  least  one 

element 

of  R(0,  a^)  is  estimated 

Then  the  asymptotic  joint  distribution  of  R^  and  R2  is  given  by 


v'n((R^,  R^)  -  (P^,  P^)]  -  N(0,  S) 

where  S  =  (Oj^j  )  and 


Oil  =  — Si  L(0,  Oil,  +  — C2_Llaj^l|PiP..«....gi.Llj _ 2C3  L(ai>L(0.  a2)|P(0.  oi)  |  . 

|P(Ol)P  |P(oi)|'^  |P(“l)P 


O22  “  — 23  L ( 0 1 — 22-^  +  — S4-Lt02MP(Q«  -  2cg  L(a2)L(0,  a2)|-Pt0j  _ ' 

|P(a2)P  lP(a2)l'‘  |P(“2)I^ 


012  ~  _ S _  I 

|P(«l) I |P(02) I 

[ 

C  =  I  C7  L(0,  a^)  L(C,  +  _ C3  L(ai)  L(0,  a2)|P(0.  Oi)  [ _ 

I  |P(Ol)| 

1 

+  Cg  L(0,  ail__L(a2) jPfO.  a2>|  +  Cjo  L(ai>  L(a2)|P(0.  ail||P(0.  02^1  I  . 

|P(a2)l  1p(«i) I |P(02) I  ^ 
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=  Var(|R(0,  Cg  =  Cov(|R(a2)|,  |R{0,  a2)\), 

C2  =  Var(  |R{aj)  I ) ,  c-j  -  Cov(  |R{0,  aj^)|,  |R(0,  02)! )» 

C3  =  Cov(  |R(aj^)  I ,  |R(0,  aj^)|),  Cg  =  Cov(|R{a3)|,  |R(0,  02)!), 

C4  =  Var{|R{0,  oj)!),  Cg  =  Cov( |r(0,  a^) | ,  |R(a2)l), 

Cg  =  Var( (R(02) 1 ) ,  c^q  =  Cov{ |R(a3) ] ,  |R(a2)|)f 

and  the  covariances  c^,  Cj^q  are  given  in  Result  7.  The  asymptotic 

distribution  of  R^  and  R2  is  given  by 

Vn[(Ri,  R2)  -  (Pi,  P2)]  -  N(0,  T), 

where  T  =  (Tij)» 


'll  =  -Sl- 


-SLiiJi- - ga-LjailLtOt-Qi)  |P(P«  OllJ—  ' 


4|P(ai) 


4|P(ai)l4  p2 


2|P(ai) 


4lP(ai) 1 lP{a2) |PiP2 

and  C3,  Cg  and  C  are  given  above. 

Results  3,  4,  5,  and  6  are  correct  as  stated  for  the  case  of  some  known 
correlations,  provided  that  the  covariance  matrix  for  the  joint  distribution  of 
(Rj,  B2)  derived  via  Results  7  and  8  is  used  in  place  of  that  given  in  Results 
1  and  2. 

Note 

Although  Results  7  and  8  provide  a  method  to  increase  precision  of 
estimates  by  using  known  values  of  intercorrelations  among  predictor  variables, 
extensive  computations  have  shown  that  it  produces  only  a  small  increase.  The 
use  of  the  method  given  in  these  results  is  computationally  rather  involved  and 
could  thus  be  justified  only  if  sample  sizes  were  quite  marginal.  If  the  overall 
power  of  tests  for  pooled  incremental  validity  is  adequate,  the  additional 
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precision  afforded  by  the  use  of  these  methods  does  not  justify  the  added 
computational  complexity. 


SUMMARY  OF  PROCEDURES  FOR  SYNTHESIZING 
INCREMENTAL  VALIDITY  RESULTS 

This  section  is  a  practical  guide  to  procedures  for  synthesizing  the 
results  of  incremental  validity  studies.  It  provides  a  step-by-step  listing  of 
procedures  to  be  followed  for  both  estimation  of  incremental  validity  across 
studies  and  testing  of  the  combined  significance  of  the  results.  An  example 
based  on  hypothetical  results  from  four  schools  demonstrates  the  application  of 
the  procedures^. 

Stqi  I:  Conduct  the  Incremental  Validity  Study  at  Eadi  Site 

At  each  site  (school)  the  incremental  validity  study  compares  a  sample 
validity  another  sample  validity  R2  determine  whether  R2  is  larger 

than  R^.  Formally  this  involves  a  test  of  the  hypothesis  that  the  population 
validity  P2  associated  with  R2  exceeds  the  population  validity  associated  with 
R^;  that  is,  a  test  of  the  hypothesis 

Hq  :  P2  =  Pl¬ 
ot  the  identical  test  that  P2  =  P^.  The  details  of  the  hypothesis  test  depend 
on  whether  dependent  or  independent  samples  are  used  to  compute  R^^  and  R2,  as 
discussed  above. 


R,  and  R^  Computed  from  the  Same  Sample 

Case  1.  If  R^  and  R2  are  computed  from  the  same  sample  and  the  predictors 
for  R2^  are  a  subset  of  the  predictors  for  R2,  then  the  appropriate  test  for 
incremental  validity  is  the  usual  F-test  for  change  in  multiple  correlation.  Let 
a  be  the  number  of  tests  used  as  predictors  in  R^  and  let  b  >  a  be  the  number  of 
tests  used  as  predictors  in  R2,  and  let  n  be  the  sample  size.  Compute  the  F-test 
given  in  Equation  1  and  compare  it  to  the  critical  value  for  an  F-distribution 
with  (p  -  a)  and  (n  -  b  -  1)  degrees  of  freedom.  Reject  the  hypothesis  of  no 
incremental  validity  if  the  computed  value  of  F  exceeds  the  critical  value. 

Case  2 .  If  R^  and  R2  are  computed  from  the  same  sample  but  one  set  of 
predictors  is  not  a  subset  of  the  other,  the  usual  F-test  for  change  in  multiple 
correlation  cannot  be  used.  Compute  the  test  statistic  given  in  Equation  2. 
Reject  the  hypothesis  of  no  incremental  validity  at  significance  level  a  if  the 


^  The  methods  described  in  Steps  IV  through  VII  are  less  accurate  than 
the  methods  of  Appendix  B  when  the  second  predictor  set  includes  the  first,  as 
it  does  in  the  examples.  The  methods  described  in  these  sections  are  valid  if 
the  predictor  sets  are  disjoint  or  if  the  samples  are  independent. 
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computed  value  of  exceeds  the  100 (1-a)  percentile  point  of  the  chi-squared 
distribution  with  one  degree  of  freedom. 

and  R,  Computed  from  Independent  Samples 

If  and  R2  are  computed  from  independent  samples  the  usual  F-test  for 
change  in  multiple  correlation  cannot  be  used.  Compute  the  test  statistic 
given  in  Equation  3.  Reject  the  hypothesis  of  no  incremental  validity  at 
significance  level  a  if  the  computed  value  of  X^  exceeds  the  100 (1-a)  percentile 
point  of  the  chi-squared  distribution  with  one  degree  of  freedom. 

Example 

Table  6  shows  a  small  data  set  representing  the  results  of  validity  studies 
on  four  independent  samples  or  schools.  Separate  regressions  (using  batteries 
1  and  2)  have  been  conducted  for  each  school  to  obtain  the  values  of  R^  and  R2 
for  each  single  sample  of  subjects.  Table  6  shows  the  differences  in  squared 
correlations  that  lead  to  the  individual  significance  (F)  tests.  For  this 
excunple  we  have  used  a  »  10  and  b  =  20  for  all  four  schools  or  studies.  (Either 
a  or  b  or  both  could  vary  across  studies,  however.) 

Each  school's  F-test  is  presented  in  Table  7,  with  upper-tail  p  values  (in 
the  second  column)  indicating  that  significant  increases  in  validity  are  found 
for  two  of  the  four  schools.  The  probabilities  ranged  from  .004  to  .539.  Two 
of  the  results  are  "significant"  by  traditional  standards  (i.e.,  a  <  .05). 


Table  ( 
Example:  Data 


School 

n 

^1 

(a) 

«2 

(b) 

p2  ^ 

R2  - 

Air  Traffic  Controller 

470 

.400 

(10) 

.420 

(20) 

.016 

Fire  Control  Technician 

530 

.380 

(10) 

.424 

(20) 

.036 

Gunner's  Mate 

700 

.440 

(10) 

.473 

(20) 

.030 

Electrician's  Mate 

460 

.250 

(10) 

.290 

(20) 

.022 

Step  D:  Compute  Tests  of  Combined  Sigmficance  of  Incremental  Validity 

The  validity  study  conducted  at  each  site  will  have  provided  a  significance 
test  as  described  in  Step  I.  From  each  study's  significance  teat,  an  upper¬ 
tailed  probability  is  obtained.  These  values  p^^  through  pj^  should  then  be  used 
to  compute  either  Stouffer's  (Stouffer  et  al.,  1949)  or  Fisher's  (1932)  combined 
significance  test,  depending  on  the  expected  outcomes  of  interest. 

Stouffer's  test,  given  in  Equation  4,  may  be  somewhat  more  likely  to  detect 
the  outcome  in  which  all  sites  show  roughly  equal-sized  increments  to  validity. 
Fisher's  test  (Equation  5)  should  be  used  if  the  question  of  added  validity  for 
any  population  is  of  interest.  The  hypothesis  of  no  increment  to  validity  in  any 
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population  is  rejected  at  level  a  if  the  selected  test  exceeds  the  100 (1-a) 
percent  critical  value  in  the  appropriate  reference  distribution. 


Example 

Table  7  shows  the  values  of  the  transformed  p's  used  in  the  two  combined 
significance  tests.  The  values  for  the  normal  deviates  (z(Pj^))  and  the  log- 
transformed  p's  were  obtained  using  the  mathematical  and  probability  functions 
of  the  Minitab  mainframe-computer  package  (Ryan  et  al.,  1985). 


Table? 

Example:  Computation  of  Significanoe  Tests 


School 

F 

(df) 

Pi 

*(Pi) 

log (Pi) 

Air  Traffic  Controller 

0.894 

(10, 

449) 

.539 

-0.097 

-0.62 

Fire  Control  Technician 

2.220 

(10, 

509) 

.016 

2.155 

-4.16 

Gunner's  Mate 

2.600 

(10, 

679) 

.004 

2.636 

-5.47 

Electrician's  Mate 

1.059 

(10, 

439) 

.393 

0.272 

-0.93 

Totals 

4.965 

-11.19 

The  Stouffer  value,  which  equals  2.48,  is  significant  compared  to  the 
standard  normal  distribution  (p  =  .007).  The  Fisher  value  of  22.37  is  compared 
to  the  chi-square  distribution  with  2k  =  8  degrees  of  freedom.  The  observed 
level  of  significance  for  the  Fisher  test  was  .0043,  only  slightly  smaller  than 
the  probability  for  the  Stouffer  test.  Both  are  significant  at  even  the 
relatively  stringent  a  =  .01  significance  level. 

Both  tests  indicate  that  the  null  model,  of  no  increment  to  validity  in  any 
population  Studied,  should  be  rejected.  The  additional  test  battery  does  add  to 
validity  in  at  least  one  of  the  populations  studied.  The  combined 
significance  methods  cannot  identify  which  population  or  populations  show  this 
added  validity,  however. 

Step  ni:  Obtain  Infonnation  for  Artifact  Correctioii  in  Each  Study 

In  order  to  correct  the  incremental  validity  in  a  study  for  the  artifacts 
of  unreliability  and  restriction  of  range,  two  pieces  of  information  are  needed. 
One  is  the  criterion  reliability.  The  other  is  the  ratio  u  of  the  standard 
deviation  of  the  test  score  in  the  unrestricted  population  to  the  standard 
deviation  in  the  study.  The  unrestricted  population  must,  of  course,  be  defined 
in  the  S2une  way  for  all  studies.  A  good  choice  for  the  unrestricted  population 
would  be  the  general  applicant  pool  that  takes  the  ASVAB.  No  example  of  artifact 
correction  is  provided  here. 

Step  IV:  Compute  the  Inda  of  Incranental  Validity  and  its  Variance  for  Each  Study 

In  order  to  combine  the  incremental  validities  across  studies,  it  is 
necessary  to  compute  the  index  of  incremental  validity  and  its  sampling  variance 
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In  each  study.  The  entire  process  should  be  done  once  for  the  index  3  of  change 
in  multiple  correlations  and  once  for  the  index  d*  of  change  in  squared  multiple 
correlations.  First  compute  the  indexes  of  artifact-corrected  incremental 
validity 

A  A 


and 


t 


using  the  formulas  given  in  Equations  6  and  7.  The  sampling  variances  of  these 
indexes  depend  on  whether  ^2  computed  from  the  same  sample  or  from 
independent  saunples. 


If  and  R2  are  computed  from  the  same  sample  in  a  particular  site,  use 
the  formulas  given  in  Equations  12  and  13  to  compute  the  sampling  variances  of 
3  and  3*.  If  R^  and  R2  are  computed  from  independent  samples,  use  the  formulas 
given  in  Equations  10  and  11  to  compute  the  sample  variance  of  3  and  3*. 


If  artifact  corrections  are  not  used,  then  c^,  C2,  and  y  at'e  all  assigned 
a  value  of  1  in  Equations  10,  11,  12,  or  13  when  computing  the  sampling  variance 
incremental  validity. 


Example 

Table  8  shows  the  multiple  correlations  R^  and  R2,  the  covariances  between 
R^  and  R2,  and  the  estimates  of  d  and  their  variances  for  the  four  hypothetical 
schools.  Analogous  values  for  d*  (the  difference  in  squared  correlations),  and 
covariances  between  R^  and  R^  are  shown  in  Table  9.  Because  the  data  are  from 
one  sample  within  each  school  and  artifact  corrections  were  not  applied. 
Equations  12  and  13  were  used  to  compute  the  variances  with  the  values  of  c^,  C2, 
and  Y  set  to  1.0. 

Table  8 

Example:  Estimates  and  Variances  of  Dinerenccs  in  CorrclatioiiB 


School 

Rl 

«2 

cl  —  R2  ”  Rj, 

Cov ( Rj , 

R2)  32(d) 

3(d) 

Air  Traffic  Controller 

0.400 

0.420 

0.020 

0.66 

0.0001 

0.0117 

Fire  Control  Technician 

0.380 

0.424 

0.044 

0.63 

0.0003 

0.0166 

Gunner's  Mate 

0.440 

0.473 

0.033 

0.56 

0.0002 

0.0137 

Electrician's  Mate 

0.250 

0.290 

0.040 

0.76 

0.0004 

0.0208 

Table  9 

Example:  Estimates  and  Variances  of  DifTerenGes  in  Squared  GmrelatioiiB 

School 

m 

Cov(R^,  r|)  32(d*) 

o(d*) 

Air  Traffic  Controller 

0.160 

0.176 

0.016 

0.445  0.0001 

0.0092 

Fire  Control  Technician 

0.144 

0.180 

0.036 

0.417  0.0001 

0.0116 

Gunner's  Mate 

0.194 

0.224 

0.030 

0.464  0.0002 

0.0128 

Electrician's  Mate 

0.062 

0.084 

0.022 

0.220  0.0001 

0.0115 
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It  is  also  possible  to  compute  confidence  intervals  using  the  d  and  d* 
estimates  from  the  four  studies.  Table  10  shows  95  percent  confidence  intervals 
for  the  population  v.  .  erences  in  correlations  (P2  -  Pi)  and  for  the  squared 
differences  (P2  -  P^)  ^or  the  four  schools  in  the  example.  These  confidence 
intervals  also  provide  an  alternative  method  of  testing  the  null  hypothesis  of 
no  incremental  validity  in  each  study.  However,  for  smaller  samples,  the  usual 
F-test  will  be  more  accurate  since  the  confidence  intervals  are  based  on 
asymptotic  (large-sample)  results. 

Note  that  negative  values  of  P2  -  P^  are  impossible  when  battery  2  includes 
battery  1  (and  they  may  be  highly  implausible  in  other  circumstances).  If 
negative  lower  confidence  limits  are  computed  in  such  circumstances,  they  should 
be  truncated  to  zero. 


Table  10 

Example:  Ninety-five  Percent  Confidence  Intervals  for  Inoremcntal  Valiifities 


School 

0^ 

M 

10 

-  Pi 

6*  = 

p2  _  p2 

P2  -  Pi 

Lower  limit 

Upper  limit 

Lower  limit 

Upper  limit 

Air  Traffic  Controller 

-0.003 

0.043 

-0.002 

0.035 

Fire  Control  Technician 

0.012 

0.076 

0.013 

0.059 

Gunner's  Mate 

0.006 

0.060 

0.005 

0.055 

Electrician's  Mate 

-0.001 

0.081 

-0.000 

0.045 

Note:  Negative  values  are  included  to  illustrate  computations. 


Step  V:  C^ilculate  the  Variance  Across  Studies  of  the  Population  Values  of  the  Incremental  Validities  in  the 
Unrestricted  Population 


Compute  the  estimate  of  the  variance  across  studies  of  the  population 
values  of  the  incremental  validities  in  the  unrestricted  populations  using  the 
formula  given  in  Equation  18.  Carry  out  the  analysis  once  for  d  and  once  for 
d*.  That  is,  if  dj^,  ...,  dj^  are  the  d  indices  from  the  k  studies  to  be  combined, 
let 


and 


s2(Ti) 


T 


1 


di,  Tj  =  d2,  .  .  .,  Tj^  -  dj^ 

S^(T2)  =  o^(d2),  ...  ,  S^(Tjj) 


^  O  A 

/  H . 


o£(dk) 


and  apply  the  formula  given  in  Equation  18. 
the  d  values.  If  ...,  dj^  are  the 
Equation  18  with 


and 


^  * 

d-i ,  . . . , 


s2(Ti)  =  a2(aj). 


Then  carry  out  the  same  process  with 
d*  indices  from  the  k  studies,  use 


s2(T^)  =  ^2(3;^). 


To  test  the  hypothesis  that  the  incremental  validity  varies  across  studies, 
compute  the  test  statistic  H  using  the  computational  formula  given  in  Ec[uation 
19. 
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Example 

Again  our  example  is  based  on  uncorrected  estimates  d  and  d*.  For  both  the 
d  and  d  estimates,  the  variance  components  computed  using  Equation  18  were 
estimated  to  be  zero.  This  suggests  that  there  is  no  variation  in  the  parameters 
representing  incremental  validity,  when  either  differences  or  squared  differences 
in  multiple  correlations  are  used.  All  populations  under  study  can  be  considered 
to  show  the  same  increment  in  validity  due  to  the  added  predictor  variables. 

In  fact,  both  values  were  actually  slightly  negative,  though  they  were 
very  small.  Actual  values  were  -0.00015  for  d  and  -0.00006  for  d*. 
Conventionally,  however,  such  negative  variance-component  estimates  are  truncated 
to  zero. 

Table  11  shows  the  terms  used  in  the  computation  of  the  H  statistics  (also 
the  pooled  estimates  and  their  standard  errors)  for  the  two  incremental  validity 
measures.  The  weight  terms  for  the  two  measures  (labeled  w^  and  w^)  are  computed 
as  the  inverses  of  the  variances  of  each  school's  estimates.  Each  weight  is  then 
multiplied  by  its  respective  incremental  validity  estimate  and  the  square  of  the 
estimate,  as  shown  in  Table  11. 


Table  11 

Example:  Computadon  of  the  Summary  Stadsdcs 
for  Two  Incremental  Validity  Indices 


d  *  R2 

-  Rl 

d*  = 

r|  -  rI 

School 

Weight (w^) 

Widi 

Widi^ 

Weight (w^) 

Widi 

w  d  •  ^ 
11 

Air  Traffic 

Controller  7353.29 

147.07 

2.94 

11690.25 

191.72 

3.14 

Fire  Control 

Technician  3642.53 

160.27 

7.05 

7376.92 

263.92 

9.44 

Gunner's  Mate  5293.70 

174.69 

5.76 

6066.33 

180.35 

5.36 

Electrician' 

s  Mate  2314.85 

92.59 

3.70 

7616.13 

168.32 

3.72 

Totals 

18604.37 

574.62 

19.46 

32749.64 

804.30 

21.67 

The  homogeneity  test  statistics  for  both  measures  of  incremental  validity 
also  support  the  finding  of  consistency  in  the  magnitudes  of  the  population 
parameters.  In  each  case,  the  test  statistic  H  has  a  chi-square  distribution 
with  three  degrees  of  freedom  under  the  null  hypothesis  of  no  variation  in 
population  parameters.  Using  Equation  19,  the  homogeneity  test  is  H  >  1.714  (p 
=  .63,  3)  for  the  differences  in  multiple  correlations  (i.e.,  the  value 
computed  using  the  d  estimates).  The  value  of  the  test  for  the  squared  multiple 
correlations  is  H  =  1.915  (p  =  .59,  ^  =  3).  Neither  value  is  significant  even 
at  the  most  lenient  conventional  significance  level  (e.g.,  a  «  .10). 

Step  VI:  rsiniiate  the  Combined  Estimate  of  Incremental  Validity 

The  combined  estimate  of  incremental  validity  is  a  weighted  average  of  the 
values  from  the  individual  studies.  Compute  the  combined  (weighted  average) 
estimate  across  studies  of  d  and  d*  separately  using  the  formula  given  in 
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Equation  14,  and  use  the  formula  in  Equation  16  to  compute  the  standard  error  of 
each  weighted  average.  First,  let 

Ti  =  a^,  ...,  -  a^ 

and 

s2(Ti)  *  a2(ai),  ...,  s2(Tjj)  « a2(ajj) 
to  combine  the  a  values.  Then  let 

A  ^ 

Ti  =  di, 

and 

s2(Ti)  =  al(dl), 
to  combine  the  a«  values. 

Example 

The  combined  estimates  of  incremental  validity  are  weighted  averages  of  the 
differences  and  of  the  squared  differences  shown  in  Tables  6  and  7.  For  both 
measures  of  added  validity,  these  averages  can  be  considered  to  represent  common 
parameters,  since  the  null  hypothesis  of  no  variation  was  retained  for  both.  The 
average  weighted  difference  in  multiple  correlations  is  0.031  with  a  standard 
error  of  0.007.  The  average  difference  in  squared  multiple  correlations  is  0.025 
with  a  standard  error  of  0.006. 

Step  VD:  Compute  a  Confidenoe  Interval  for  the  Incremental  Validity 

Use  the  pooled  estimate  of  incremental  validity  and  its  standard  error 
computed  in  Step  VI  to  compute  a  confidence  interval  for  the  incremental 
validity.  If  this  confidence  interval  does  not  contain  zero  or,  equivalently, 
if  the  test  given  in  Equation  17  leads  to  a  significant  Z  value,  reject  the 
hypothesis  of  zero  incremental  validity. 

Example 

Ninety-five  percent  confidence  intervals  were  computed  using  the  two  pooled 
estimates  of  the  validity  increment.  For  the  difference  in  multiple  correlations, 
the  interval  is 

0.0166  <  P2  -  Pi  <  0.0452. 

The  interval  does  not  contain  zero,  which  suggests  that  a  significant  increment 
to  the  validity  of  prediction  can  be  expected  across  all  populations. 
(Similarly,  one  can  compute  a  Z  test;  for  these  data  the  value  is  Z  «  4.21,  p  < 
.001) . 


. . . ,  Tjj  «  djj 

...,  s2(T^)  .  S2(aj^) 


The  confidence  interval  for  the  population  difference  in  squared 
correlations  is 

0.0137  <  <  0.0354, 

and  the  test  of  the  null  hypothesis  that  the  population  squared  difference  equals 
zero  is  Z  -  4.44  (p  <  .0001).  Again  the  results  indicate  a  nonzero  incremental 
validity  across  schools. 
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CONCLUSIONS 


Pooling  estimates  across  sites  provides  a  viable  strategy  for  estimating 
the  incremental  validity.  If  a  single  sample  is  used  in  each  site  to  assess 
incremental  validity,  the  test  for  the  statistical  significance  of  the  pooled 
estimate  will  have  adequate  power  to  detect  increments  in  validity  of  .02  given 
pooled  sample  sizes  of  N  s  4,000. 

RECOMMENDATION 

Estimates  of  the  incremental  validity  of  alternative  test  batteries  should 
be  based  on  pooled  estimates  derived  from  several  samples,  using  the  methods 
outlined  in  this  report. 
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APPENDIX  A 


STATISnCAL  ANALYSIS  SYSTEM  (SAS)  PROGRAM 
TO  COMF13TE  COMBINED  SIGNIFICANCE  TESTS 


A-0 


SAS  Program  to  Compute  Combined  Significance  Tests 


OPTIONS  NOCENTER; 

DATA  ONE; 

INPUT  RSI  RS2  A  B  N; 

DIFF-RS2-RS1; 

DF1»B-A; 

DF2*N-B-1; 

F«DIFF*DF2/{DF1*(1-RS2) ) ; 
P=l-PROBF(F,DFl,DF2)  ; 

Z=PROBIT(P) ; 

LOGP»LOG(P) ; 

CARDS; 

.160  .176  10  20  470 

.144  .180  10  20  530 

.194  .224  10  20  460 

.062  .084  10  20  700 

t 

PROC  PRINT;  VAR  RSI  RS2  N  A  B  F  P; 

PROC  PRINT;  VAR  DIFF  2  LOOP; 

PROC  MEANS  NOPRINT  SUM  N; 

VAR  2  LOOP; 

OUTPUT  OUT«SUMS  SUM=SUM2  SUML  N=K; 

DATA  TCS;  SET  SUMS; 

2S=SUM2/SQRT(K) ;  CF=-2*SUML; 
PS=l-PROBNORM( 2S) ;  PC=l-PROBCHI (CF, 2*K) ; 
ENDSAS; 
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APPENDIX  B 


A  SIMULATION  STUDY  OF  THE  DISTRIBUTION  OF 
THE  DIFFERENCE  IN  SQUARED  MULTIPLE  CORRELATIONS 
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A  Simulation  Study  of  the  Distribution  of 
the  Difference  in  Squared  Multiple  Correlations^ 


Background 

The  main  body  of  this  report  has  presented  asymptotic  fonnulas  for  the  variance  in  the  difference  in 
multiple  correlations  or  squared  multiple  correlations,  both  for  the  case  of  independent  samples  and  for  the 
case  when  the  two  correlations  are  based  on  the  same  sample  but  different  sets  of  predictors.  The  latter 
case  breaks  down  into  three  sub-cases,  the  first  two  of  which  are  most  important: 

(a)  the  second  set  of  predictors  includes  the  first  as  a  subset  (IS) 

(b)  the  two  sets  of  predictors  are  disjoint  (DJ) 

(c)  the  two  predictor  sets  overlap  but  are  neither  inclusive  subsets  nor  disjoint. 

The  formulas  were  intended  to  apply  to  several  current  studies  of  the  incremental  validity  of  adding 
new  aptitude  tests  to  the  10-test  Armed  Services  Vocational  Aptitude  Battery  (ASVAB).  In  a  recent  study, 
Wolfe  (1991)  reported  the  validities  for  predicting  school  performance  in  nine  Navy  technical  training 
schools  when  four  new  predictors  were  added  to  the  ASVAB.  Sample  sizes  ranged  from  97  to  929.  The 
validity  increments  were  0,  0,  .(X)l,  .007*,  .014,  .014**,  .018**,  .029*,  and  .051**.  Subsequent 
significance  tests  for  the  increase  in  validity  from  adding  a  single  predictor  to  the  ASVAB  showed  highly 
significant  improvement  for  increases  as  small  as  .004  when  the  sample  size  was  929.  The  mean  validity 
increase  across  schools  ranged  from  .002  to  .006  when  only  one  predictor  was  added  to  the  ASVAB.  This 
is  an  example  of  Case  a  described  in  the  first  paragraph. 

In  the  same  study,  an  alternate  forni  of  the  ASVAB  was  re-administered  after  enlistment,  and  its  vali¬ 
dity  was  compared  with  the  pre-enlistment  ASVAB.  After  correction  for  range  restriction,  the  two  bat¬ 
teries  differed  by  only  .009,  on  the  average.  This  is  an  example  of  Case  b  comparison. 

Problem 

The  variance  formulas  assume  asymptotic  normality  of  the  difference  in  squared  multiple  correla¬ 
tions.  But  in  Case  a,  the  sample  difference  in  squared  correlations  is  non-negative.  If  the  true  population 
difference  is  zero,  the  sample  differences  will  approach  zero  as  the  sample  size  increases,  while  remaining 
non-negative.  Such  a  distribution  cannot  be  normal.  If  the  population  difference  is  non-zero  but  small,  we 
can  expect  slow  convergence  toward  ncHmality  as  the  sample  size  increases.  The  rate  at  which  the  sample 
difference  apfn'oaches  normality  will  determine  whether  the  asymptotic  approximations  given  earlier  in  this 
report  will  have  practical  utility. 

Approach 

In  order  to  study  the  behavior  of  the  asymptotic  formulas,  simulations  were  performed  with  six  dif¬ 
ferent  sets  of  artificially  specified  peculation  parameters  and  three  different  sample  sizes.  Table  B-1  shows 
the  characteristics  of  different  samples.  The  samples  for  inclusive  predictor  subsets  are  labeled  with  the 
initial  letters  IS,  and  the  disjoint  sets  with  the  initial  letters  DJ.  The  two  letters  are  followed  by  three  digits 
indicating  the  true  difference  in  squared  multiple  correlation.  Sample  sizes  of  1(X),  400,  and  10(X)  are  desig¬ 
nated  A,  B,  and  C  respectively. 

For  each  simulated  sample,  uniform  pseudo-random  numbers  were  generated  by  a  method  due  to 
L’Ecuyer  (1988).  These  were  converted  to  a  Gaussian  (0,1)  distribution  by  a  circular  transformation 
described  by  Knuth  (1981,  p.  116ff.).  Finally,  a  Cholesky  factorization  of  the  population  correlation  mauix 
of  predictors  and  criterion  was  used  to  generate  a  multivariate  normal  distribution  of  raw  scores. 

’This  appendix  was  written  by  John  H.  Wolfe. 

*p<.05  ••p<.01 
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There  were  1000  replications  for  each  sample  size  and  each  population  correlation  matrix.  For  exam¬ 
ple,  set  1S003B  consisted  of  1000  samples  of  400  observations  drawn  from  a  population  where  the  true 
difference  in  the  squared  multiple  correlation  was  .003. 


Table  B-1.  Description  of  Simulation  Samples 

Samples 

N 

ri2 

'ly 

'■zr 

Ri 

R2 

Inclusive  Predictor  Sets: 

{xil.  {x, 

X2] 

ISOOO 

.  1000 

.50 

.400 

.200 

.4000 

.4000 

IS003  A,B,C 

100,400, 1000 

.30 

.500 

.200 

.5000 

.5027 

IS006  A,B,C 

100,400, 1000 

.60 

.400 

.300 

.4000 

.4070 

IS013  A,B,C 

100,400, 1000 

.50 

.400 

.300 

.4000 

.4163 

Disjoint  Predicts  Sets- 

(Xl),  (X2) 

DJOOO 

100 

.99 

.400 

.400 

.4000 

.4000 

DJOlO  A,B,C 

100,400, 1000 

.70 

.455 

.466 

.4550 

.4660 

In  each  sample,  the  sample  correlation  matrix  was  computed,  along  with  the  multiple  correlations, 
their  difference,  and  the  difference’s  asymptotic  variance  estimated  from  sample  values.  These  were  com¬ 
pared  with  the  asymptotic  variance  estimated  from  population  values,  and  with  the  standard  deviation 
observed  across  replications. 

Results 

Table  B-2  compares  the  means  and  standard  deviations  of  squared  correlation  differences  with  their 
theoretical  values.  Here  S*  »  the  population  difference  in  squared  multiple  conelatioiis  and  d*  is  its  sam¬ 
ple  value.  Oj.  is  the  theoretical  asymptotic  standard  deviation  of  d*  based  on  population  values  (Result  3, 
Equation  26),  and  SDEVj.  is  the  standard  deviation  reserved  across  the  1000  replications.  Note  the  singu¬ 
larities  in  the  first  (ISOOO)  sample,  where  the  population  multiple  correlations  are  identical.  Asymptotic 
normal  theory  breaks  down  in  this  case  by  predicting  a  zero  value  for  a^.  Column  six  measures  the  devia¬ 
tion  of  d*  from  its  theoretical  value;  the  denominator  is  the  thetx'etical  standard  error  of  d*  across  1000 
replications.  If  the  theory  is  correct,  column  six  will  be  a  normal  (0, 1)  deviate. 

The  sixth  column  of  Table  B-2  is  a  significance  test  for  the  difference  between  d*  and  its  theoretical 
value.  Except  for  the  DJOlOA  sets,  there  is  no  significant  deviation  of  d’  from  its  theoretical  value. 
SDEVj.  is  compared  with  its  theoretical  value  in  the  last  two  columns  of  Table  B-2.  All  but  the  last  two 
inclusive  subset  samples  have  significantly  greater  variance  than  asymptotic  theory  predicts.  The  disjoint 
sets  are  in  substantial  agreement  with  theory. 

Table  B-3  displays  various  measures  of  normality  for  d* .  All  of  the  inclusive  predictor  subsets  are 
non-normal,  even  for  large  samples,  while  all  of  the  disjoint  sets  are  normal.  . 

Table  B-4  shows  the  behavior  of  Oj, ,  the  sample  estimates  of  the  standard  deviation  of  d* ,  computed 
from  Eq.  (8)  using  the  replication’s  sample  correlations.  Each  replication  has  a  different  a^.  This  should 
be  compared  with  Oj.  in  Table  B-2,  which  uses  population  correlations  in  a  similar  formula  of  Result  3, 
and  with  SDEVj.  in  Table  B-2,  which  is  an  observed  value  across  replications.  Column  6  shows  the  corre¬ 
lation  between  d*  and  a^.^.  In  the  usual  sampling  theory  based  on  normal  parent  distributions,  these  would 
be  expected  to  be  independent,  not  correlated.  This  independence  is  essential  for  using  Student’s  t- 
disbibution  to  establish  confidence  intervals.  Here  the  correlations  are  greater  than  .99  for  all  inclusive 
subsets.  (Probably  the  only  reason  they  are  not  1.00  is  rounding  error  in  the  values  of  d*  and  which 
were  rounded  to  4  digits.)  The  distribution  of  the  T-statistic  is  shown  in  the  right-most  four  columns  of 


B-2 


Table  B-2.  Means  and  Standard  Deviations  of  Squared-Correlation  Differences 


•  •  Slyr.v  2 

Sample  N  5*  S*+Bias  d*  ^ 


Inclusive  Predictor  Sets 

ISOOO 

1000  0.000000  0.000841 

0.000900 

oo 

0.000000  0.001239 

OO 

0 

IS003 A 

100  0.002747  0.010280 

0.010082 

-0.690111 

0.009050  0.013669 

2.281392 

<  10-*“ 

IS003B 

400  0.002747  0.004609 

0.004767 

1.106663 

0.004525  0.005339 

1.392384 

<10-*-' 

1S003C 

1000  0.002747  0.003490 

0.003514 

0.258851 

0.002862  0.003086 

1.162848 

.0003 

1S006A 

100  0.005625  0.013949 

0.013412 

-1.242703 

0.013670  0.017656 

1.668217 

<io-« 

IS006B 

400  0.005625  0.007685 

0.007488 

-0.910621 

0.006835  0.0072% 

1.139543 

.0014 

1S006C 

1000  0.005625  0.006447 

0.006422 

-0.185840 

0.004323  0.004555 

1.110381 

.0083 

IS013 A 

100  0.013333  0.021404 

0.022382 

1.480195 

0.020880  0.024662 

1.395064 

<10-*'‘ 

1S013B 

400  0.013333  0.015330 

0.015074 

-0.7746% 

0.010440  0.010393 

0.990961 

.5744 

1S013C 

1000  0.013333  0.014130 

0.014181 

0.241779 

0.006603  0.006455 

0.9557% 

.8384 

Disjoint  Predictor  Sets 

DJOOO 

100  0.000000  0.000000 

-0.000061 

-0.185280 

0.010360  0.010459 

1.019283 

.3288 

DJOlO  A 

100  0.010131  0.009914 

0.004970 

-2.520654 

0.062030  0.064624 

1.085380 

.0307 

DJOlO  B 

400  0.010131  0.010077 

0.009863 

-0.217976 

0.031015  0.029938 

0.931754 

.9390 

DJOlO  C  1000  0.010131  0.010109 

0.010460 

0.565348 

0.019616  0.019997 

1.039268 

.1891 

Table  B-3.  Normality  of  Squared-Correlation  Differences 


Sample 

N 

6* 

Skewness 

Kurtosis 

Kolomogorov  D 

P 

Inclusive  Predictor  Sets 

ISOOO 

1000 

0 

2.3991 

6.9438 

.2338 

<.01 

IS003A 

100 

.0027 

3.0044 

16.5396 

.2304 

<.01 

1S003B 

400 

.0027 

1.8945 

4.8198 

.1860 

<.01 

IS003C 

1000 

.0027 

1.4189 

2.6474 

.1274 

<■01 

IS006A 

100 

.0056 

2.4313 

8.1191 

.2237 

<.01 

IS006B 

400 

.0056 

1.7039 

4.0798 

.1524 

<.01 

IS006C 

1000 

.0056 

1.0519 

1.5267 

.0826 

<.01 

IS013A 

100 

.0133 

1.7557 

3.6659 

.1821 

<.01 

IS013B 

400 

.0133 

.9537 

1.267 

.0735 

<.01 

IS013C 

1000 

.0133 

.5916 

.5317 

.0442 

<.01 

Disjoint  Predictor  Sets 

DJOOO 

100 

0 

.0325 

.0058 

.2874 

>.15 

DJOlOA 

100 

.0101 

.0872 

.3230 

.0263 

.09 

DJOlOB 

400 

.0101 

-.0180 

.0241 

.0197 

>.15 

DJOIOC 

1000 

.0101 

-.0255 

.1784 

.0197 

>.15 

Table  B-4.  Here  dw  is  the  value  of  d*  corrected  for  bias  using  the  Wherry  (1931)  shrinkage  formula.  Nor¬ 
mal  theory  would  predict  that  T  would  have  zero  mean,  unit  standard  deviation,  and  no  skewness  or  kur- 
tosis.  The  obtained  values  for  disjoint  sets  are  in  line  with  these  expectations,  but  not  for  the  inclusive  sub¬ 
sets. 
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Table  B-4.  Estimated  Errors  and  T-ratios  for  Squared-Correlation  Differences 

Sample 

N 

S’ 

r(d*.a^.2) 

T  =  (5* 

-dZ)la^ 

Mean 

Std.  Dev. 

Mean 

Std.  Dev. 

Skewness 

Kurtosis 

.001395 

.001034 

.9992 

1.5284 

5.4964 

8.4747 

95.7577 

.013778 

.009907 

.9914 

2.2943 

9.7766 

9.6427 

400 

.003169 

.9979 

2.4773 

15.4916 

297.4342 

.9989 

1.4652 

6.1591 

83.2858 

.016744 

.011741 

.9922 

3.3358 

15.0184 

8.9495 

95.4991 

400 

.0056 

.003715 

.9982 

1.9732 

12.0017 

15.0540 

266.6279 

.001667 

.8876 

4.9319 

14.4417 

259.7843 

IS013A 

.022233 

4.2551 

32.4311 

15.5740 

289.0354 

IS013B 

400 

.003907 

.9971 

.8270 

4.2313 

13.2275 

218.1110 

.0133 

.001565 

.9979 

.2781 

9.6911 

Disjoint  Predictor  Sets 

DJOOO 

100 

.0000 

.010196 

.001742 

-.0408 

.9976 

-.0262 

-.2136 

DJOlOA 

100 

.0101 

.060657 

.006639 

.0274 

.0888 

DJOlOB 

400 

.0101 

.030891 

.001664 

.0018 

.0070 

.9685 

-.0170 

DJOIOC 

1000 

.0101 

.019588 

.000657 

.0067 

-.0185 

.0422 

.1598 

Finally,  Table  B-5  shows  what  would  happen  if  one  tried  to  base  confidence  intervals  on  the  asymp¬ 
totic  estimates  of  variance.  The  middle  three  columns  show  the  5  percent,  the  median,  and  the  95  percent 
values  of  the  T-statistic  observed  among  the  1000  replications  of  a  sample.  Normal  theory  would  predict 
these  values  to  be  -1.645, 0,  and  +1.645.  The  disjoint  predictor  samples  come  close,  but  the  inclusive  pred¬ 
ictor  subsets  do  not  The  last  two  columns  of  Table  B-5  show  the  number  of  replications  in  which  6*  falls 
outside  of  a  "confidence  interval”  of  d'±\.96aj..  Normal  theory  would  expect  25±10  at  each  end.  The 
observed  frequencies  for  the  disjoint  sets  are  close,  but  the  inclusive  subsets  are  grossly  deviant  from  nor¬ 
mal  theory. 

Discussion 

The  asymptotic  formulas  derived  in  the  main  body  of  this  report  seem  to  work  very  well  when  the 
predictor  sets  are  disjoint,  but  are  less  satisfactory,  even  on  large  samples,  when  one  set  includes  another. 
Several  alternative  remedies  could  be  tried: 

Use  the  mean  squared  error  (MSE)  instead  of  multiple  coirelation.  Sympson  (1979)  suggested  this 
approach  because,  when  adjusted  for  degrees  of  freedom,  the  difference  in  MSEs  can  be  negative.  How¬ 
ever,  the  difference  in  such  MSEs  is  proportional  to  the  difference  in  Wherry-corrected  squared  multiple 
correlations  (Wherry,  1931).  Although  not  shown  in  the  tables,  the  Wherry-corrected  values  had  skewness 
and  kurtosis  values  that  agreed  with  the  uncorrected  values  to  two  decimals  in  most  of  these  simulations. 

Transform  the  multiple  correlations.  A  Fisher  z-transform  or  other  transform  will  still  allow  the 
difference  to  approach  its  population  value  with  increasing  sample  size,  while  remaining  non-negative.  If 
the  population  difference  is  zero,  the  distribution  of  the  difference  cannot  be  normal. 

Transform  the  difference  in  squared  multiple  correlations.  Since  the  distributions  are  sometimes 
almost  J-shaped,  they  are  difficult  to  normalize.  It  would  be  desirable  if  a  variance-stabilizing  transforma¬ 
tion  could  be  found  to  eliminate  the  large  correlation  between  d*  and  as  well  as  normalize  d* . 
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Sample 

Table  B-5.  Confidence  Intervals  for  Squared-Correlation  Differences 

T  =  (8*  -dZyOj.  Frequenc . 

N  8’  1  5%  50%  95%  1  8*<d:-1.96o^  8*>d>1.96<T^ 

ISOOO 

1000 

.0000 

-.7699 

Inclusive  Predictor  Set 
.3616  6.2172 

s 

6 

171 

IS003A 

100 

.0027 

-.8783 

.4119 

7.7085 

1 

21 

IS003B 

400 

.0027 

-1.0196 

.3106 

8.0732 

2 

227 

IS003C 

1000 

.0027 

-1.1226 

.2455 

6.3724 

2 

159 

IS006A 

100 

.0056 

-.9414 

.4984 

10.0743 

3 

228 

IS006B 

400 

.0056 

-1.08% 

.3175 

6.8493 

2 

199 

IS006C 

1000 

.0056 

-1.1728 

.2011 

3.6934 

4 

141 

IS013A 

100 

.0133 

-1.1432 

.3493 

10.1649 

5 

207 

IS013B 

400 

.0133 

-1.1508 

.1914 

3.8499 

3 

137 

IS013C 

1000 

.0133 

-1.2847 

.0838 

2.3916 

5 

78 

DJCXK) 

100 

.0000 

-1.6850 

Disjoint  Predictor  Sets 
-.0253  1.6430 

21 

19 

DJOlOA 

100 

.0101 

-1.7213 

.1089 

1.9215 

31 

48 

DJOlOB 

400 

.0101 

-1.6067 

.0261 

1.6441 

22 

26 

DJOIOC 

1000 

.0101 

-1.6524 

-.0467 

1.6052 

30 

26 

Use  resampling  techniques.  One  could,  of  course,  abandon  attempts  to  obtain  explicit  mathematical 
expressions  for  the  sample  variance  of  multiple  correlation  differences  and  use  resampling  techniques  to 
estimate  variances  in  each  data  sample.  While  it  may  be  useful  in  practice,  such  an  approach  is  beyond  the 
scope  of  this  paper. 

Use  other  sampling  distributions.  The  inadequacy  of  the  above  approaches  almost  forces  us  to  use 
non-Gaussian  sampling  distributions.  These  are  outlined  in  the  next  section. 


Non-normal  Sampling  Theory  for  Correlation  Differences 

g*  ^  «• 

Let  S  =  with  sample  estimate  d  =  ,  (o  is  often  called  the  effect  size.)  Let 

u  =  b  -  a  and  v  =  N  -  b  -1  be  the  degrees  of  freedom.  When  5*  =  0,  the  distribution  of  is  exactly  cen¬ 
tral  Fuy  When  5*  >  0,  the  distribution  is  a  non-central  F  with  non-centrality  parameter  X  =  (N-a)S 


(Cohen,  1988,  p.  551).  The  mean  of  F  is  given  by 

F  = 


V-Z'"  u 

From  this,  it  is  readily  seen  that  an  unbiased  estimate  of  $  is 

b“(i  /W—b‘^3  c  i\ 

Or,  in  terms  of  d,  an  unbiased  estimate  of  $  is 

(N-b-Vtd-b+a 


(B-1) 


(B-2) 


{B-3) 


^Lee(1971)  hv  developed  moie  accurate  non<*ntral  F  approumationi  for  the  special  case  when  a  =  0  by  fitting  the  first 
three  moments. 


B-5 


For  N  >  100,  the  non-central  y}  will  serve  nearly  as  well.  Here,  y}  =  vd  with  the  same  non¬ 
centrality  parameter  X.  The  mean  of  a  non-central  y}  is  given  by 


Then  the  point  estimate  for  S  is 


Or, 


yi  =  u+\. 


N-a 


i-  (N-b-\)d-b+a 
- • 


(B-4) 

(B-5) 

(B-6) 


In  comparing  this  formula  with  the  one  for  the  non-central  F,  it  is  seen  that  the  x^-based  estimate  of  S  is 
biased  upward  by  2dl{N-a).  However,  it  is  easier  to  average  results  across  samples  and  compute 
confidence  intervals  with  the  non-central  y^  distribution  than  with  the  non-central  F. 

Suppose  that  there  are  k  such  y}  peculations  with  possibly  different  values  of  6.  Then  the  sum 

5  =  ij(M-b-lW  (B-7) 

has  a  non-central  y}  distribution  with  ku  degrees  of  freedom  and  non-centrality  parameter 

X=ij(Af.-a)Si.  (B-8) 

A  weighted  mean  of  the  across  the  different  populations  can  be  defined  as  S=  where 

N  =  yyv, .  Then  a  nearly  unbiased  estimate  of  S  =  Mean  (S, ).  Hence, 


(B-9) 


The  value  of  S  can  be  used  to  compute  the  upper  and  lower  2.5  percent  limits  for  a  confidence  inter¬ 
val  X.02S  ^  X  ^  X97S  by  means  of  Applied  Statistics  algorithm  170  (Narnia  &  Desu,  1981).  From  this,  it  is 
evident  that  a  95  percent  confidence  interval  can  be  established  around  the  weighted  mean  of  6  with 
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s  S  s 


V»7S 
N  -ka  ■ 
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There  is  some  difficulty  in  relating  these  results  to  d*  itself.  Since  d  is  a  function  of  both  R}  and 
d* ,  the  relation  between  d  and  d*  is  not  one  to  one.  The  transformation  w  =  -log(l-R^)  may  be  useful 
here.^  Differentiating  w  (or  expanding  it  in  a  Taylor  series)  shows  that,  to  a  first  order  approximation. 
Aw  s  wj-wi  =  d.  Thus  it  may  be  possible  to  develop  all  of  the  results  in  the  w  metric  rather  than  the  R 
metric. 


Simulation  Results  with  Non-normal  Sampling  Distributions 

Tables  B-6  and  B-7  compare  the  observed  and  theoretical  values  of  the  first  four  moments  of  the 
non-central  F  model  of  J  and  the  non-central  model  of  Aw .  respectively.  It  is  evident  that  both  models 
fit  the  first  three  moments  rather  well,  and  that  the  non-central  F  model  fits  the  fourth  moment  better  than 
the  non-central  Results  for  the  non-central  model  of  d  are  not  shown,  but  were  were  no  better  than 
those  shown  in  Table  B-7;  in  fact  the  ratios  of  observed  to  theoretical  variances  deviated  slightly  more 
from  1  than  they  did  in  Table  B-7. 

^  It  is  imeiesting  to  note  that  Moschopouloa  (1983)  has  shown  that  W,  raised  to  a  suitable  power  it  approximately 
Gaussian. 


B-6 


Table  B-6.  Observed  vs.  Theoretical  Moments  of  Non-Central  F  fitted  to 


Mean  Variance  Skewness  Kurtosis 


Sample 

N 

6* 

Sample 

Theory 

Sample 

Theory 

Ratio 

Sample 

Theory 

Sample 

Theory 

ISOOO 

1000 

0. 

0.96 

1.00 

2.02 

2.01 

1.01 

2.75 

2.84 

9.66 

12.15 

IS003A 

100 

0.0027 

1.43 

1.39 

3.91 

3.72 

1.05 

2.54 

2.74 

8.20 

11.32 

IS003B 

400 

0.0027 

2.56 

2.48 

8.51 

8.02 

1.06 

1.86 

1.99 

3.95 

5.56 

IS003C 

1000 

0.0027 

4.86 

4.68 

17.48 

16.84 

1.04 

1.31 

1.42 

2.31 

2.77 

IS006A 

100 

0.0056 

1.86 

1.70 

6.00 

5.04 

1.19 

2.33 

2.51 

6.49 

9.37 

IS006B 

400 

0.0056 

3.69 

3.71 

11.88 

13.02 

0.91 

1.49 

1.62 

3.36 

3.66 

IS006C 

1000 

0.0056 

7.82 

7.75 

29.89 

29.24 

1.02 

1.06 

1.10 

1.28 

1.66 

IS013A 

100 

0.0133 

2.58 

2.65 

8.41 

9.08 

0.93 

1.87 

2.04 

4.32 

6.06 

IS013B 

400 

0.0133 

7.49 

7.47 

28.74 

28.45 

1.01 

1.13 

1.15 

1.80 

1.83 

IS013C 

1000 

0.0133 

17.35 

17.15 

70.03 

67.45 

1.04 

0.70 

0.75 

0.52 

0.77 

Table  B-7.  Observed  vs.  Theoretical  Moments  of  Non-Central  fitted  to  Alog(l-R2) 

Mean  Variance  Skewness  Kurtosis 

5* 

Sample 

Theory 

Sample 

Theory 

Ratio 

Sample 

Theory 

Sample 

Theory 

ISOOO 

1000 

0. 

0.96 

1.00 

2.01 

2.00 

1.01 

2.74 

2.83 

9.59 

10.00 

1S003A 

100 

0.0027 

1.40 

1.36 

3.62 

3.46 

1.05 

2.43 

2.60 

7.43 

7.87 

1S003B 

400 

0.0027 

2.54 

2.47 

8.29 

7.87 

1.05 

1.83 

1.96 

3.80 

3.32 

1S003C 

1000 

0.0027 

4.84 

4.68 

17.22 

16.70 

1.03 

1.30 

1.41 

2.22 

0.70 

1S006A 

100 

0.0056 

1.82 

1.67 

5.47 

4.66 

1.17 

2.23 

2.38 

5.79 

6.09 

1S006B 

400 

0.0056 

3.66 

3.68 

11.51 

12.72 

0.90 

1.45 

1.59 

3.12 

1.48 

1S006C 

1000 

0.0056 

7.78 

7.71 

29.26 

28.85 

1.01 

1.04 

1.09 

1.22 

-0.39 

1S013A 

100 

0.0133 

2.50 

2.58 

7.58 

8.34 

0.91 

1.76 

1.91 

3.68 

3.07 

1S013B 

400 

0.0133 

7.39 

7.38 

27.29 

27.54 

0.99 

1.08 

1.12 

1.59 

-0.32 

1S013C 

1000 

0.0133 

1 

17.17 

16.98 

67.28 

65.94 

1.02 

0.68 

0.73 

0.46 

-1.28 

Applied  Statistics  algorithm  170  (Narula  &  Desu,  1981)  for  the  non-central  was  used  to  compute 
9S  percent  confidence  intervals  based  on  the  sample  value  of  d  in  each  replication.  The  numbers  of  repli¬ 
cations  (out  of  a  1000)  for  which  the  population  values  of  S  or  Aw  fell  above  or  below  the  confidence  inter¬ 
vals  were  tabulated  in  Table  B-8.  Applying  the  binomial  distribution  to  the  fi^uencies,  (N  =  1000,  p  = 
.025),  the  observed  firequency  should  lie  in  the  range  2S±10  for  95.8  percent  of  the  ten  simulated  popula¬ 
tions  at  the  upper  end  and  95.8  percent  of  the  ten  populations  at  the  lower  end.  Frequencies  that  lie  outside 
the  range  25±10  are  marked  with  an  asterisk. 

For  the  model  of  d,  two  of  the  ten  populations  had  95  percent  confidence  intervals  that  were  too 
often  above  S  and  three  of  them  were  too  often  low.  For  the  x*  model  of  Aw ,  none  of  the  ten  populations 
had  95  percent  confidence  intervals  that  were  too  often  above  the  population  value  of  Aw  and  three  of  them 
were  too  often  low. 

When  5*  =  0,  there  were  no  sample  estimates  of  the  upper  2.5  percent  limit  that  were  less  than  the 
true  value,  zero.  Every  sample  estimate  of  the  lower  2.5  percent  limit  was  greater  than  zero,  usually  in  the 
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Table  B-8.  Frequency  of  Samples  with  Population  Values  Falling 
Outside  95%  Confidence  Intervals  based  on  Non-central  Models 


Mog{\-R^) 


Sample 

N 

5* 

N  Below 
2.5%  Limit 

N  Above 
97.5%  Limit 

N  Below 
2.5%  Limit 

N  Above 
97.5%  Limit 

ISOOO 

1000 

0. 

30 

0 

30 

0* 

IS003A 

100 

0.0027 

27 

27 

27 

27 

IS003B 

400 

0.0027 

36* 

32 

33 

32 

IS003C 

1000 

0.0027 

27 

21 

25 

21 

IS006A 

100 

0.0056 

41* 

13* 

34 

13* 

IS006B 

400 

0.0056 

21 

29 

19 

29 

1S006C 

1000 

0.0056 

23 

22 

23 

22 

IS013A 

100 

0.0133 

27 

39* 

25 

39* 

IS013B 

400 

0.0133 

25 

25 

23 

25 

IS013C 

1000 

0.0133 

28 

21 

26 

21 

eighth  decimal  place.  This  latter  effect  was  corrected  by  subtracting  10~6  from  the  lower  2.5  percent  limits 
computed  by  the  program. 

Although  the  approximations  for  the  confidence  intervals  do  not  perform  as  well  as  might  be 
expected  for  non-central  F  formulas  in  these  simulations,  they  work  quite  well  for  the  largest  sample  sizes. 
In  any  case,  they  are  a  substantial  improvement  over  the  normality-based  confidence  intervals  in  Table  B-5. 

Example 

Table  B-9  shows  the  application  of  these  methods  to  the  illustrative  data  in  Tables  7,  10,  and  1 1  in 
the  main  body  of  the  report.  The  confidence  intervals  and  unbiased  estimates  are  based  on  the  non-central 
chi-square  distribution.  The  p,  values  at  the  right-most  column  are  based  on  the  central  chi-square  distri¬ 
bution,  and  are  close  to  the  F  probabilities  in  Table  7.  The  probability  for  the  combined  sample,  .004, 
agrees  with  the  Fisher  value  previously  given. 


Table  B-9.  Ninety  -Five  Percent  Confidence  Intervals  for  Effect  Sizes 


School 

Effect  Size 

d* 

Unbiased  Effect 

Lower  Limit 

2.5“^ 

Upper  Limit 

97.5% 

Pi 

Air  Traffic  Controller 

.538 

Fire  Control  Technician 

.0228 

Gunner’s  Mate 

Electrician’s  Mate 

.0007 

.0000 

.0337 

Combined  Sample 

.0319 

.0130 

.0027 

.0283 

.004 
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Conclusion 

Asymptotic  normal  theory  works  well  when  the  predictor  sets  are  disjoint.  When  one  predictor  set 
includes  another,  then  non-central  F  or  chi-square  distributions  may  be  used  to  establish  confidence  inter¬ 
vals  for  the  effect  sizes  in  each  sample,  and  for  the  mean  effect  size  across  different  samples.  Unfor¬ 
tunately,  it  is  not  clear  how  to  test  hypotheses  concerning  differences  among  populations.  Adjusting  effect 
sizes  for  artifacts  of  range  restriction  or  criterion  unreliability  will  require  further  research. 
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